bellingcat / cisticola

Coordinates scrapers and interfaces with database
15 stars 0 forks source link

Youtube scraper crashes when a video is age-restricted #32

Closed loganwilliams closed 2 years ago

loganwilliams commented 2 years ago
2022-04-02 14:35:31.071 | ERROR    | cisticola.scraper.base:scrape_channels:398 - An error has been caught in function 'scrape_channels', process 'MainProcess' (63688), thread 'MainThread' (140362836334400):
Traceback (most recent call last):

  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/yt_dlp/extractor/common.py", line 617, in extract
    ie_result = self._real_extract(url)
                │    │             └ 'https://www.youtube.com/watch?v=SQu7vJOeorE'
                │    └ <function YoutubeIE._real_extract at 0x7fa8b9b03af0>
                └ <yt_dlp.extractor.youtube.YoutubeIE object at 0x7fa8b58a5100>
  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/yt_dlp/extractor/youtube.py", line 3336, in _real_extract
    self.raise_no_formats(reason, expected=True)
    │    │                └ 'Sign in to confirm your age. This video may be inappropriate for some users.'
    │    └ <function InfoExtractor.raise_no_formats at 0x7fa8b9b37940>
    └ <yt_dlp.extractor.youtube.YoutubeIE object at 0x7fa8b58a5100>
  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/yt_dlp/extractor/common.py", line 1126, in raise_no_formats
    raise ExtractorError(msg, expected=expected, video_id=video_id)
          │              │             │                  └ None
          │              │             └ True
          │              └ 'Sign in to confirm your age. This video may be inappropriate for some users.'
          └ <class 'yt_dlp.utils.ExtractorError'>

yt_dlp.utils.ExtractorError: Sign in to confirm your age. This video may be inappropriate for some users.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 1389, in wrapper
    return func(self, *args, **kwargs)
           │    │      │       └ {}
           │    │      └ ('https://www.youtube.com/watch?v=SQu7vJOeorE', <yt_dlp.extractor.youtube.YoutubeIE object at 0x7fa8b58a5100>, False, {'n_ent...
           │    └ <yt_dlp.YoutubeDL.YoutubeDL object at 0x7fa8b5c776d0>
           └ <function YoutubeDL.__extract_info at 0x7fa8b9a2c1f0>
  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 1459, in __extract_info
    ie_result = ie.extract(url)
                │  │       └ 'https://www.youtube.com/watch?v=SQu7vJOeorE'
                │  └ <function InfoExtractor.extract at 0x7fa8b9b395e0>
                └ <yt_dlp.extractor.youtube.YoutubeIE object at 0x7fa8b58a5100>
  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/yt_dlp/extractor/common.py", line 643, in extract
    raise type(e)(e.orig_msg, **kwargs)
                                └ {'video_id': 'SQu7vJOeorE', 'ie': 'youtube', 'tb': <traceback object at 0x7fa8afdfb540>, 'expected': True, 'cause': None}

yt_dlp.utils.ExtractorError: [youtube] SQu7vJOeorE: Sign in to confirm your age. This video may be inappropriate for some users.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "/home/loganw/cisticola/app.py", line 137, in <module>
    scrape_channels(args)
    │               └ Namespace(command='scrape-channels', gsheet=None, media=False)
    └ <function scrape_channels at 0x7fa8b7d069d0>

  File "/home/loganw/cisticola/app.py", line 101, in scrape_channels
    controller.scrape_all_channels(archive_media = args.media)
    │          │                                   │    └ False
    │          │                                   └ Namespace(command='scrape-channels', gsheet=None, media=False)
    │          └ <function ScraperController.scrape_all_channels at 0x7fa8b9a311f0>
    └ <cisticola.scraper.base.ScraperController object at 0x7fa8b7cce280>

  File "/home/loganw/cisticola/cisticola/scraper/base.py", line 332, in scrape_all_channels
    return self.scrape_channels(channels, archive_media=archive_media)
           │    │               │                       └ False
           │    │               └ [Channel(name='Qanonfighters', platform_id='1412770923', category='qanon', platform='Telegram', url='https://ttttt.me/qanonfi...
           │    └ <function ScraperController.scrape_channels at 0x7fa8b9a31310>
           └ <cisticola.scraper.base.ScraperController object at 0x7fa8b7cce280>

> File "/home/loganw/cisticola/cisticola/scraper/base.py", line 398, in scrape_channels
    for post in posts:
        │       └ <generator object YoutubeScraper.get_posts at 0x7fa8b5cb1e40>
        └ ScraperResult(scraper='YoutubeScraper 0.0.1', platform='Youtube', channel=111, platform_id='RnbKUstg9yg', date=datetime.datet...

  File "/home/loganw/cisticola/cisticola/scraper/youtube.py", line 47, in get_posts
    raise e

  File "/home/loganw/cisticola/cisticola/scraper/youtube.py", line 43, in get_posts
    meta = ydl.extract_info(
           │   └ <function YoutubeDL.extract_info at 0x7fa8b9a2c040>
           └ <yt_dlp.YoutubeDL.YoutubeDL object at 0x7fa8b5c776d0>

  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 1380, in extract_info
    return self.__extract_info(url, self.get_info_extractor(ie_key), download, extra_info, process)
           │                   │    │    │                  │        │         │           └ True
           │                   │    │    │                  │        │         └ {}
           │                   │    │    │                  │        └ False
           │                   │    │    │                  └ 'YoutubeTab'
           │                   │    │    └ <function YoutubeDL.get_info_extractor at 0x7fa8b9a289d0>
           │                   │    └ <yt_dlp.YoutubeDL.YoutubeDL object at 0x7fa8b5c776d0>
           │                   └ 'https://www.youtube.com/channel/UCiCa3JX-RpgAftyDcpdiv4w'
           └ <yt_dlp.YoutubeDL.YoutubeDL object at 0x7fa8b5c776d0>
  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 1389, in wrapper
    return func(self, *args, **kwargs)
           │    │      │       └ {}
           │    │      └ ('https://www.youtube.com/channel/UCiCa3JX-RpgAftyDcpdiv4w', <yt_dlp.extractor.youtube.YoutubeTabIE object at 0x7fa8b5c77e80>...
           │    └ <yt_dlp.YoutubeDL.YoutubeDL object at 0x7fa8b5c776d0>
           └ <function YoutubeDL.__extract_info at 0x7fa8b9a2c1f0>
  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 1473, in __extract_info
    return self.process_ie_result(ie_result, download, extra_info)
           │    │                 │          │         └ {}
           │    │                 │          └ False
           │    │                 └ {'uploader': 'DataBase Italia', 'uploader_id': 'UCiCa3JX-RpgAftyDcpdiv4w', 'uploader_url': 'https://www.youtube.com/channel/U...
           │    └ <function YoutubeDL.process_ie_result at 0x7fa8b9a2c3a0>
           └ <yt_dlp.YoutubeDL.YoutubeDL object at 0x7fa8b5c776d0>
  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 1598, in process_ie_result
    return self.__process_playlist(ie_result, download)
           │                       │          └ False
           │                       └ {'uploader': 'DataBase Italia', 'uploader_id': 'UCiCa3JX-RpgAftyDcpdiv4w', 'uploader_url': 'https://www.youtube.com/channel/U...
           └ <yt_dlp.YoutubeDL.YoutubeDL object at 0x7fa8b5c776d0>
  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 1797, in __process_playlist
    entry_result = self.__process_iterable_entry(entry, download, extra)
                   │                             │      │         └ {'n_entries': 170, '_last_playlist_index': 170, 'playlist_count': 170, 'playlist_index': 140, 'playlist_autonumber': 140, 'pl...
                   │                             │      └ False
                   │                             └ {'_type': 'url', 'ie_key': 'Youtube', 'id': 'SQu7vJOeorE', 'url': 'https://www.youtube.com/watch?v=SQu7vJOeorE', 'title': 'At...
                   └ <yt_dlp.YoutubeDL.YoutubeDL object at 0x7fa8b5c776d0>
  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 1389, in wrapper
    return func(self, *args, **kwargs)
           │    │      │       └ {}
           │    │      └ ({'_type': 'url', 'ie_key': 'Youtube', 'id': 'SQu7vJOeorE', 'url': 'https://www.youtube.com/watch?v=SQu7vJOeorE', 'title': 'A...
           │    └ <yt_dlp.YoutubeDL.YoutubeDL object at 0x7fa8b5c776d0>
           └ <function YoutubeDL.__process_iterable_entry at 0x7fa8b9a2c5e0>
  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 1819, in __process_iterable_entry
    return self.process_ie_result(
           │    └ <function YoutubeDL.process_ie_result at 0x7fa8b9a2c3a0>
           └ <yt_dlp.YoutubeDL.YoutubeDL object at 0x7fa8b5c776d0>
  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 1548, in process_ie_result
    return self.extract_info(
           │    └ <function YoutubeDL.extract_info at 0x7fa8b9a2c040>
           └ <yt_dlp.YoutubeDL.YoutubeDL object at 0x7fa8b5c776d0>
  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 1380, in extract_info
    return self.__extract_info(url, self.get_info_extractor(ie_key), download, extra_info, process)
           │                   │    │    │                  │        │         │           └ True
           │                   │    │    │                  │        │         └ {'n_entries': 170, '_last_playlist_index': 170, 'playlist_count': 170, 'playlist_index': 140, 'playlist_autonumber': 140, 'pl...
           │                   │    │    │                  │        └ False
           │                   │    │    │                  └ 'Youtube'
           │                   │    │    └ <function YoutubeDL.get_info_extractor at 0x7fa8b9a289d0>
           │                   │    └ <yt_dlp.YoutubeDL.YoutubeDL object at 0x7fa8b5c776d0>
           │                   └ 'https://www.youtube.com/watch?v=SQu7vJOeorE'
           └ <yt_dlp.YoutubeDL.YoutubeDL object at 0x7fa8b5c776d0>
  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 1407, in wrapper
    self.report_error(str(e), e.format_traceback())
    │    └ <function YoutubeDL.report_error at 0x7fa8b9a2a670>
    └ <yt_dlp.YoutubeDL.YoutubeDL object at 0x7fa8b5c776d0>
  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 939, in report_error
    self.trouble(f'{self._format_err("ERROR:", self.Styles.ERROR)} {message}', *args, **kwargs)
    │    │                                                                      │       └ {}
    │    │                                                                      └ ('  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/yt_dlp/extractor/common.py", l...
    │    └ <function YoutubeDL.trouble at 0x7fa8b9a2a280>
    └ <yt_dlp.YoutubeDL.YoutubeDL object at 0x7fa8b5c776d0>
  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 879, in trouble
    raise DownloadError(message, exc_info)
          │             │        └ (<class 'yt_dlp.utils.ExtractorError'>, ExtractorError('Sign in to confirm your age. This video may be inappropriate for some...
          │             └ '\x1b[0;31mERROR:\x1b[0m [youtube] SQu7vJOeorE: Sign in to confirm your age. This video may be inappropriate for some users.'
          └ <class 'yt_dlp.utils.DownloadError'>

yt_dlp.utils.DownloadError: ERROR: [youtube] SQu7vJOeorE: Sign in to confirm your age. This video may be inappropriate for some users.
2022-04-02 14:35:31.124 | INFO     | cisticola.scraper.base:scrape_channels:407 - YoutubeScraper 0.0.1 found 0 new posts from Channel(name='DataBase Italia', platform_id=None, category='conspiracy', platform='Youtube', url='https://www.youtube.com/channel/UCiCa3JX-RpgAftyDcpdiv4w', screenname=None, country='IT', influencer='DataBase Italia', public=True, chat=False, notes=None, source='researcher')