bellingcat / auto-archiver

Automatically archive links to videos, images, and social media content from Google Sheets (and more).
https://pypi.org/project/auto-archiver/
MIT License
489 stars 53 forks source link

Empty entry number if using Google Storage catch #57

Closed djhmateer closed 1 year ago

djhmateer commented 1 year ago

A small change so that if using Google Drive and there is an empty entry number, which correlates to the new folder be created, it is ignored until the next run.

This is a 'normal' workflow under some conditions.

msramalho commented 1 year ago

Hi Dave, I was looking into this logic and I don't think it makes sense to include it in the base-code since it is very specific to your project. We have a default folder value (controllable in: https://github.com/bellingcat/auto-archiver/blob/3095ce305459b40bd60d73ba6ee84d834449fd21/auto_archive.py#L86) but making an exception that every google drive archival that does not have a folder will be skipped would break other users' expectations. One suggestion is to handle this in your google sheets, for example:

  1. you get a URL in a column unconfirmedURL
  2. when you get your entry number from slack the actual URL column will detect it and copy the value from unconfirmedUrl, you could do that with sth like =IF(LEN(entryNumberCell)==0, "", unconfirmedUrlCell)
  3. this way the archiver which should only look for the URL col would only see it (and therefore archive) when the entryNumber is present

Let me know if this made sense to your use case, because I only deduced it from the comments.

djhmateer commented 1 year ago

Agreed this is a very specific edge case.

Thanks for thinking this through. I'd prefer not to change the sheets now (as there are quite a few).

Would it be acceptable to create a new config under google_drive called ignore_if_folder_not_specified_in_spreadsheet

If this is too 'edgey' then I'm totally fine with just keeping it in mine.

Thanks again for all your work.

msramalho commented 1 year ago

Hi Dave, Indeed I think that will be to edge-casey as the normal flow creates/uses the default folder names that come from the spreadsheet name/worksheet name, thanks for understanding.