Open davidgasquez opened 1 year ago
Also, publish via RoAPI.
Also, generate a Frictionless package (with Dagster) for the final datasets parquet files.
Would be nice to expose an static data api (url.com/dataset/partition/data.json
) and perhaps some custom graphs at url.com/dataset/partition/
?
Also, publish on GitHub artifacts. Pypi does something like this for some of their datasets which then surfaces via a Next.js app.
Wow, I didn't know RoAPI, awesome!
+1 for parquet files.
I would wait duckdb become at least 1.0 to use it as a file format.
I think the DuckDB database could be pushed to Huggingface too!
https://huggingface.co/docs/huggingface_hub/en/guides/upload#upload-a-file
Maybe it is best to way to the first release version of duckdb. I head it will be soon. Meanwhile, I would upload a parquet.
We should publish datasets in multiple places