IQSS / dataverse.harvard.edu

Custom code for dataverse.harvard.edu and an issue tracker for the IQSS Dataverse team's operational work, for better tracking on https://github.com/orgs/IQSS/projects/34
5 stars 1 forks source link

Harvest over 2 million datasets from the US Government (data.gov) #56

Open pdurbin opened 4 years ago

pdurbin commented 4 years ago

"The largest topics that the datasets cover are geosciences, biology, and agriculture. The majority of governments in the world publish their data and describe it with http://schema.org. The USA leads in the number of open government datasets, with more than 2 million." -- https://twitter.com/usdatagov/status/1220396785543282688

We should consider harvesting these, especially since OAI-PMH is supported. Here's a screenshot of the output from the "Identify" verb at https://catalog.data.gov/csw?mode=oaipmh&verb=Identify

Screen Shot 2020-01-24 at 1 37 12 PM

Docs, should we need them, have been kindly provided by @kalxas at https://twitter.com/tzotsos/status/1220770998146031619 and can be found at http://docs.pycsw.org/en/stable/oaipmh.html

There is also some discussion at https://github.com/GSA/data.gov/issues/888

jggautier commented 4 years ago

Just leaving questions to follow up on: