brimdata / zed

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.37k stars 67 forks source link

Support additional `from` datasources within Zed #4337

Open BrutalSimplicity opened 1 year ago

BrutalSimplicity commented 1 year ago

As a noob to the Zed ecosystem,While playing with the zed and zq CLI, and was looking for an ability to integrate it with other data sources within the zed pipeline. I know this can currently be satisfied using other tools like curl, find, psql, etc... But, I would like to have the ability to integrate these sources directly within Zed for the following benefits:

Is this something that Zed already supports?

If not, is this use case something that aligns with vision and goals for Zed?

I believe there is a similar ticket here, that is focused on extending the from / get operator to support HTTP requests. Should this go there?

A somewhat related tool in this space is steampipe, which has a plugin capability to enable new data sources. I've been a huge fan of their product, but Postgresql syntax seems a bit heavy at times. I think Zed's data lake concept plus the power of its query tools would be a really attractive alternative.

philrz commented 1 year ago

As those of us involved with the Zed project often say, "the architecture supports it". 😉 These concepts are indeed in line with the direction the project is headed. That said, as there's limited Dev resources available today, the bulk of the effort lately and in the near future is likely to be more toward the core of the tech, e.g., ensuring solid performance and ease of management with data once it's in the system. Making it easy for users to get data into the system is still important. However, the way we'll likely enable this in the short term is by publishing best practices for tools that have existing integrations with a diverse set of inputs, then show how those tools can easily push their data onward into Zed. Two recent prototyping efforts along these lines have been Logstash (#3151) and Fluentd (#4271) and pretty soon I expect we'll publish more formal docs that turn the findings in those issues into "best practices". As noted above, we also recognize that the existing get <uri> variation of from should probably be extended to cover other HTTP methods and parameters/payloads that could allow for hitting a wider set of REST APIs (#4225).

We'll keep this issue open as a record of our intent to make ongoing investment in this area. Thanks for your interest in Zed!