bellingcat / auto-archiver

Automatically archive links to videos, images, and social media content from Google Sheets (and more).
https://pypi.org/project/auto-archiver/
MIT License
552 stars 55 forks source link

Use Entry Number for the folder in Google Storage #89

Closed djhmateer closed 1 year ago

djhmateer commented 1 year ago

This is probably an issue for me to implement in the google drive storage.

eg instead of a folder name: https-www-youtube-com-watch-v-wlahzurxrjy-list-pl7a55eb715fbb2940-index-7

I like it to be the entry number eg AA001 which is taken from the good spreadsheet.

Perhaps patch in via

filename_generator: static to be filename_generator: entry_number

msramalho commented 1 year ago

You can already achieve this by having a "folder" column which is used here in storage.py when you set path_generator to flat.

Probably this feature would benefit from some better documentation :)

It is currently implemented with the ArchivingContext which adds some coupling between the modules so I'm also happy to see more flexible ways of achiving this, maybe if the Media object has a folder property which can be loaded by any Feeder and used by any Storage, instead of using a shared global context. But that's not something we will dedicate time now internally since it's not a priority,

djhmateer commented 1 year ago

Thanks @msramalho that is perfect - I've patched in the 'folder' column which we call 'Entry Number', and am using 'path_generator: flat'

I'm making some notes in my orchestration.yaml, and will submit as a PR once I get into production.. may take some time!