bellingcat / auto-archiver

Automatically archive links to videos, images, and social media content from Google Sheets (and more).
https://pypi.org/project/auto-archiver/
MIT License
552 stars 55 forks source link

Add yt-dlp based archiving for TwitterArchiver #138

Closed JettChenT closed 5 months ago

JettChenT commented 6 months ago

This PR adds a yt-dlp based twitter archiving function to TwitterArchiver as a fallback to the existing two archiving strategies. It uses the _extract_status function of yt-dlp's TwitterIE extractor to extract tweet metadata, and processes it in a similar way to the existing archiving implementation.

Upon local testing, the existing snscrape(which seems to be unmaintained) and twitter-hack solution does not work reliably for tweets, but the yt-dlp based solution does. Happy to know if it can be replicated!

Also happy to add a configuration option to TwitterArchiver for specifying the preference of tweet archiving methods(snscrape/twitter-hack/yt-dlp)