climate-mirror / datasets

For tracking data mirroring progress
201 stars 18 forks source link

EPA Environmental Dataset Gateway #365

Open JeremiahCurtis opened 7 years ago

JeremiahCurtis commented 7 years ago

Issue #273 was closed with no explanation.... Not sure of a simple way to grab https://edg.epa.gov/metadata/catalog/search/browse/browse.page (which I believe contains all EDG datasets); there are hundreds of pages of dataset listings which then link to other urls for the actual data. Many of the datasets contain biological, hydrological, and other ecological data which are probably (at least) tangentially pertinent to climate research, and which are very likely at risk. It appears that a lot of the datasets here are not replicated in the ftp or newftp directories, nor are they available via ECHO or Envirofacts.

blahah commented 7 years ago

the fastest way would be to use the rest API to page through the results:

https://edg.epa.gov/metadata/rest/find/document?searchText=*:*&f=html&start=1&max=100

then for each result retrieve the metadata entry

mirroring the newftp site does catch most of the datasets I think, based on browsing through a sample of results