bellingcat / auto-archiver

Automatically archive links to videos, images, and social media content from Google Sheets (and more).
https://pypi.org/project/auto-archiver/
MIT License
578 stars 60 forks source link

WIP: screenshots and hashing #16

Closed loganwilliams closed 2 years ago

loganwilliams commented 2 years ago
msramalho commented 2 years ago

At this stage, the changes look good although the youtube-dl timestamps might still be an issue. I can help with the last checkitem: Support detecting multiple names for columns

loganwilliams commented 2 years ago

Thanks! Yeah, I'd like to make the times consistent in UTC before merging into main. I think we need to test:

Once we've understood this behavior we should make sure that dates from the auto_archiver are consistently UTC.

Another task: currently the screenshot generator isn't a headless browser, which could cause problems with running this on a server. Should be a simple change, maybe to Chromium instead of Firefox/Gecko. Or maybe you can figure out how to get headless mode working on Firefox.

loganwilliams commented 2 years ago

Making it headless turned out to be fairly straightforward, but it required making sure the Firefox and geckodriver version numbers are compatible

loganwilliams commented 2 years ago

Timestamps are fixed :)

loganwilliams commented 2 years ago

@msramalho I tried to define the remaining two to-do items more precisely. I think those are the next steps, then its ready for your review or any other suggested changes you have.

msramalho commented 2 years ago

Had a look and these changes are great. I will work on the 2 checkbox items TODO from your original comment in another branch