datonic / datadex

📦 Serverless and local-first Open Data Platform
http://datadex.datonic.io
MIT License
222 stars 14 forks source link

Publish Static Datasets #15

Open davidgasquez opened 1 year ago

davidgasquez commented 1 year ago

We should publish datasets in multiple places

davidgasquez commented 1 year ago

Publish as parquet!

davidgasquez commented 1 year ago

Also, publish via RoAPI.

davidgasquez commented 10 months ago

Also, generate a Frictionless package (with Dagster) for the final datasets parquet files.

davidgasquez commented 10 months ago

Would be nice to expose an static data api (url.com/dataset/partition/data.json) and perhaps some custom graphs at url.com/dataset/partition/?

davidgasquez commented 8 months ago

Also, publish on GitHub artifacts. Pypi does something like this for some of their datasets which then surfaces via a Next.js app.

fredguth commented 4 months ago

Wow, I didn't know RoAPI, awesome!

+1 for parquet files.

I would wait duckdb become at least 1.0 to use it as a file format.

davidgasquez commented 3 months ago

I think the DuckDB database could be pushed to Huggingface too!

https://huggingface.co/docs/huggingface_hub/en/guides/upload#upload-a-file

fredguth commented 2 months ago

Maybe it is best to way to the first release version of duckdb. I head it will be soon. Meanwhile, I would upload a parquet.