MaterializeInc / materialize

The data warehouse for operational workloads.
https://materialize.com
Other
5.66k stars 457 forks source link

create `http-client` source type #1704

Open quodlibetor opened 4 years ago

quodlibetor commented 4 years ago

Similar to #1701, but sort of the inverse, this will configure materialized to act as a client to some http server. E.g. CREATE SOURCE wikipedia FROM 'https://stream.wikimedia.org/v2/stream/recentchange' WITH (..) will create a connection that always reconnects and emits lines to the source wikipedia

rjnn commented 4 years ago

Some thinking needs to be done here on the ergonomics, and while I'm wildly enthusiastic about #1701, I'm not as much a fan of this without some additional product iteration: it's not quite a HTTP source: because it's a polling source: whereas #1701 describes a port that stays open and reactive to calls made to it, this feature flips it around and makes it Materialize's responsibility to poll the specified HTTP endpoint. There's lots of open questions: at what cadence? What's the reconnection protocol? It's a bit thorny to figure all of this out, and we should front-load some of that thinking, perhaps with a specific use-case in mind.

quodlibetor commented 4 years ago

Oh, actually that's a good point I was imagining this as a long-lived connection to a source that continuously outputs data, not one that needs to be polled. I agree that polling would require a lot more thinking, and is interesting in its own right.

elindsey commented 2 years ago

I'm moving this off the sources and sinks board and onto the product backlog board - it'll need more poking at to determine if we want to do it. I think #2237 may subsume this task, that would cover most of the interesting use cases and in a more standardized way.