edgi-govdata-archiving / overview

🎈 Start here for current projects, how to get involved, and joining community calls, a resource for new and veteran members
GNU General Public License v3.0
118 stars 20 forks source link

Use Data Together Sentry to crawl EPA data.json #199

Closed dcwalk closed 6 years ago

dcwalk commented 7 years ago

Picking up from #119 and #120, in order to further identify datasets the EPA has both to: 1) Map the amount we've downloaded ("coverage"), and 2) Download those we don't already have

... we want to use Data Together's sentry to crawl the EPA's Environmental Dataset Gateway data.json JSON-LD. In order to get there, @b5 wants to get sentry to feature parity with the WARC 1.1 spec, which is tracked in https://github.com/datatogether/roadmap/issues/26

TODOs:

This will resolve #120 in prep for #119

dcwalk commented 7 years ago

During our September 11 Archiving call, this was indicated as an ongoing and important priority. Moving to the Fall Work Cycle milestone

dcwalk commented 6 years ago

As far as I have been updated this happened. I think larger discussions about the future of this node/where this data lives are on the horizion. I'm going to close this for now, as I imagine those conversations will pick up further along/reframe future work