Closed amirouche closed 6 years ago
What are you trying to do exactly?
The main location for the registry for bulk access is http://datahub.io/core/registry
What are you trying to do exactly?
I wanted to see how big the git repositories were. For full, disclosure I am looking for a better solution than git. At $WORK we use the same workflow but it fails with big datasets. We are thinking to move to a custom git backend (see 1). That said, I prefer a solution like rawbase.
@amirouche these git repos aren't that big (most are under 100Mb) -- intentionally.
In general git does have issues with largish datasets (depending on how the diffs work the problems come in from 100s of MBs to GB range) . There are loads of potential solutions but all involve moving to specialized tooling (the simplest is just to store complete files in e.g. s3 with versioning turned on!). What's crucial is to get clear on your use cases 😉 -- and starting as simple as possible (it's always tempting to starting building your own "castle in the sky").
If you want to chat more our chat channel is http://gitter.im/datahubio/chat
Finally, to answer your question: to find all the core datasets look at https://github.com/datasets/registry/blob/master/core-list.csv -- and then script yourself cloning them if you want.
Thanks.
I am very tempted to build my own "castle in the sky".
@amirouche people often are 😉 -- the problem is most of them remain unfinished. If you want to help out with an existing effort you can join us with https://datahub.io/ via http://gitter.im/datahubio/chat
After git cloning the repository and using
npm install
I get an error about missingdatahub-client
after manually installing it, I get another error:After doing
npm datahub-cli
it still fails with the above error.