datopian / datahub

🌀 Rapidly build rich data portals using a modern frontend framework
https://datahub.io/opensource
MIT License
2.18k stars 325 forks source link

Write blog post about Learnings from a Data Engineer while exploring the Open Data Ecosystem #1138

Open davidgasquez opened 1 year ago

davidgasquez commented 1 year ago

Write a blog post in the vein of "Here's what I learned as an data engineer from a month deep dive into the open source data and the open data ecosystem".

A "how could the future look like" section could mention:

davidgasquez commented 1 year ago

A bit more on the "How the Future could look like".

Played with Langchain + OpenAI a bit more. It is now easy to run arbitrary queries to a database with Natural Language queries (see the Querying a Database section I did here).

I was thinking... it should be possible to create a LangChain tool that can access all open data and query it. Two quick approaches.

  1. Search for relevant datapackages related to the topic T in GitHub. Add them as external tables and query them.
  2. Have something like SplitGraph that is already agregatting all datasets into a CDNed database and use that directly. :point_left: I'm going to try this one quickly with Splitgraph and report back!

cc @rufuspollock

rufuspollock commented 1 week ago

@davidgasquez would be cool to write this up (or if you have already to x-post it)

davidgasquez commented 1 week ago

I don't have it and agree it would be cool to put something out in this vein.

Any thoughts or points that you think it might be interesting to cover?

rufuspollock commented 1 week ago

@davidgasquez

I imagine as you draft stuff will come up - also feel free to just dump ideas / outline and then we can iterate from there 😄