This guide describes the DataRescue workflow we use for DataRescue activities as developed by the DataRefuge project and EDGI, both at in-person events and when people work remotely. It explains the process that a URL/dataset goes through from the time it has been identified, either by a Seeder as "uncrawlable," or by other means, until it is made available as a record in the datarefuge.org CKAN data catalog. The process involves several stages, and is designed to maximize smooth hand-offs so that each phase is handled by someone with distinct expertise in the area they're tackling, while the data is always being tracked for security.
We have moved the documentation to a more user-friendly format. You can now find the guide at datarefuge.github.io/workflow.
Note that we are still working on it, and will shortly add screenshots, etc.
Suggestions and improvements are welcome! All changes to the guide are managed through this GitHub repository. Please check our contribution guidelines for details.
DataRescue is a broad, grassroots effort with support from numerous local and nationwide networks. DataRefuge and EDGI partner with local organizers in supporting these events. See more of our institutional partners on the DataRefuge home page.