edgi-govdata-archiving / web-monitoring-processing

Tools for access, "diff"-ing, and analyzing archived web pages
https://edgi-govdata-archiving.github.io/web-monitoring-processing
GNU General Public License v3.0
20 stars 20 forks source link

Support S3 for cache files #849

Closed Mr0grog closed 1 year ago

Mr0grog commented 1 year ago

To support running this job on an actual scheduled job runner that doesn't have persistent storage (see #757), we need to be able to store the unplaybackable cache in S3. You can now use 's3://' paths in the --unplaybackable option:

wm import ia 'https://somewhere.com/' --unplaybackable 's3://bucket/unplaybackable.json'