CDLUC3 / data-curation

Exploratory project to Our goal is to catalog and evaluate datasets. We will determine ways to evaluate data files against the indicators above and offer solutions for increasing their quality. We aim to translate best practices into workflows that help with everyday use cases.
0 stars 0 forks source link

Get initial counts of dataset usage from 3 repositories #7

Closed sfisher closed 3 months ago

sfisher commented 3 months ago

This task changed to getting monthly counts of datasets from Dryad, Zenodo, Figshare.

I created 4 different colabs to do this (one for each of the three and 1 for DataCite).

https://colab.research.google.com/drive/1AvP0jxZwHL9bUB1IwZ7GP-TIpyNI2edP#scrollTo=4CV9AbKz8yEX

https://colab.research.google.com/drive/1z-5_f5XfTGZmonCwnrpmNjhNtIi9Q5KW#scrollTo=0juDiHEYrA9Z

https://colab.research.google.com/drive/1oNKHjafyMCpfgj8JsDtLrZGQTOTf_N-O#scrollTo=TvBokmBCZo5E

https://colab.research.google.com/drive/1nBTEhBYA-Z8cWQ9zaj5F3MGjkt1hePvJ#scrollTo=caPfrjfoNrmE

Found problems in the APIs and also the DataCite data do not seem to line up very well with what the repositories report for similar time periods.

I created a document that explores these things (work in progress). https://docs.google.com/document/d/1NH_iy0y4HDOstSCuCRHIDbSx3afsp75zo8-UtKvpD8A/edit