bellingcat / auto-archiver

Automatically archive links to videos, images, and social media content from Google Sheets (and more).
https://pypi.org/project/auto-archiver/
MIT License
489 stars 53 forks source link

feature: modify metadata_enricher to only write relevant metadata found #143

Open msramalho opened 1 month ago

msramalho commented 1 month ago

Currently all metadata is extracted from a file, the goal is to have an option in this enricher that only extracts metadata if specified from a list, example: [gps, datetimes, author]

So the gps would only find lat/lon information, example parsing logic from the direct output of exiftool:

def extract_coordinates(metadata):
    d_metadata = {l.split(":")[0].strip(): l.split(":", maxsplit=2)[1].strip() for l in metadata.split("\n") if ":" in l}
    gps_metdata = {}
    for k in d_metadata:
        kl = k.lower()
        if ("gps" in kl or "latitude" in kl or "longitude" in kl) and len(d_metadata[k]):
            gps_metdata[k] = d_metadata[k]
    if len(gps_metdata): return gps_metdata
    return False

Something similar could be designed for create_date!=modify_date or other relevant information found.