bellingcat / auto-archiver

Automatically archive links to videos, images, and social media content from Google Sheets (and more).
https://pypi.org/project/auto-archiver/
MIT License
489 stars 53 forks source link

no _mimetype set for final media html files #98

Closed liliakai closed 4 months ago

liliakai commented 9 months ago

A small gripe, but nice to have for consistency.

Using config formatter: html_formatter, the final media on an archive is an html page generated by the orchestrator. In practice I find that other media objects have a correctly set _mimetype property, but not the one from html_formatter. Seems like core/media.py should set the right mimetype based on filename. I think it just never gets called for the final media.

msramalho commented 7 months ago

these lines https://github.com/bellingcat/auto-archiver/blob/9eb39943c7c99a6316ff8d1c9ae3371377736c5e/src/auto_archiver/storages/s3.py#L54-L59 would do that for S3, local/google-drive storage is less relevant, unless you mean mimetype somewhere else?

And it gets set on S3 stored metadata objects.

liliakai commented 7 months ago

I mean in the metadata stored in the db rather than s3