bellingcat / auto-archiver

Automatically archive links to videos, images, and social media content from Google Sheets (and more).
https://pypi.org/project/auto-archiver/
MIT License
489 stars 53 forks source link

YouTube playlists - probably not intentional from user #68

Closed djhmateer closed 10 months ago

djhmateer commented 1 year ago

https://podrobnosti.ua/2443817-na-kivschin-cherez-vorozhij-obstrl-vinikla-pozhezha.html

This site contains a live link at the top which is a link to a YouTube playlist with 1 item which is a live stream.

Currently the 'is_live' check doesn't catch it as it is a playlist, and will proceed to download 3.7GB of stream, then create 1000's of thumbnails.

I propose a simple fix in youtubedl_archiver.py to stop downloading of playlists which stuck me as probably not what the user would want.

        if info.get('is_live', False):
            logger.warning("Live streaming media, not archiving now")
            return ArchiveResult(status="Streaming media")

       # added this catch below
        infotype = info.get('_type', False)
        if infotype is not False:
            if 'playlist' in infotype:
                logger.info('found a youtube playlist - this probably is not intended. Have put in this as edge case of a live stream which is a single item in a playlist')
                return ArchiveResult(status="Playlist")

There is probably a much more elegant way to express this!

Can submit a PR if you agree.

loganwilliams commented 1 year ago

I can't reproduce that with that URL anymore (maybe they change it to link to the video and not a playlist?). Do you have another URL that reproduces this issue?

msramalho commented 10 months ago

this can happen in multiple platforms, for example a vimeo account will exctract all of the videos from it. Proposal: a new option in the https://github.com/bellingcat/auto-archiver/blob/main/src/auto_archiver/archivers/youtubedl_archiver.py which sets the maximum number of videos to be extracted.