digitalmethodsinitiative / 4cat

The 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms.
Other
246 stars 59 forks source link

Add option to include media files in the Explorer if they've been downloaded #431

Open sal-uva opened 5 months ago

sal-uva commented 5 months ago

Linking to social media sites is pretty troublesome for a variety of reasons (expiring URLs, XSS, many requests) causing media files to often be omitted in the Explorer.

We could connect the Download images and Download videos processors to the Explorer so that media files can be retrieved from the 4CAT server instead of somewhere else, as long as they've been downloaded.

dale-wahl commented 5 months ago

Interesting. When I built the download_videos.py, I added a special DatasetVideoLibrary class that would collect all the previously downloaded videos. It was to avoid redownloading videos. You could use that to find the videos by their URL and then link to them. Something similar would work for images.

If @stijn-uva is ok with merging the Cartographer PR, the @app.route('/result/<path:query_file>') was modified to serve archived files (assuming they are part of a file in the datasets folder). This would allow you to serve either videos or images from their archives pretty easily.