Open loganwilliams opened 2 years ago
Problem was that upload type was an image, which is a case that wasn't handled. Fix is in https://github.com/bellingcat/polyphemus/commit/2383359a6d85a8be4b53b374d97d72dabc67f138
Re-opening as this issue still exists. Perhaps the possible types of media need to be looked at more comprehensively, or perhaps there needs to be a general case handler?
2022-04-03 10:37:08.583 | DEBUG | cisticola.scraper.base:scrape_channels:384 - OdyseeScraper 0.0.1 is handling Channel(name='Miss Red Pill', platform_id=None, category='explicit_qanon', platform='Odysee', url='https://odysee.com/@MissRedPill:e', screenname='@MissRedPill', country='FR', influencer='Miss Red Pill', public=True, chat=False, notes='Mainly translating US video in french - 1st circle of Qanon', source='researcher')
2022-04-03 10:37:21.656 | ERROR | cisticola.scraper.base:scrape_channels:400 - An error has been caught in function 'scrape_channels', process 'MainProcess' (70483), thread 'MainThread' (140648972392256):
Traceback (most recent call last):
File "/home/loganw/cisticola/app.py", line 134, in <module>
scrape_channels(args)
│ └ Namespace(command='scrape-channels', gsheet=None, media=False)
└ <function scrape_channels at 0x7feb56dd2af0>
File "/home/loganw/cisticola/app.py", line 98, in scrape_channels
controller.scrape_all_channels(archive_media = args.media)
│ │ │ └ False
│ │ └ Namespace(command='scrape-channels', gsheet=None, media=False)
│ └ <function ScraperController.scrape_all_channels at 0x7feb58a7c280>
└ <cisticola.scraper.base.ScraperController object at 0x7feb56da7be0>
File "/home/loganw/cisticola/cisticola/scraper/base.py", line 334, in scrape_all_channels
return self.scrape_channels(channels, archive_media=archive_media)
│ │ │ └ False
│ │ └ [Channel(name='Qanonfighters', platform_id='1412770923', category='qanon', platform='Telegram', url='https://ttttt.me/qanonfi...
│ └ <function ScraperController.scrape_channels at 0x7feb58a7c3a0>
└ <cisticola.scraper.base.ScraperController object at 0x7feb56da7be0>
> File "/home/loganw/cisticola/cisticola/scraper/base.py", line 400, in scrape_channels
for post in posts:
│ └ <generator object OdyseeScraper.get_posts at 0x7feb5516e350>
└ ScraperResult(scraper='OdyseeScraper 0.0.1', platform='Odysee', channel=150, platform_id='da10f6d869475821c2bd44ff3a1e435c064...
File "/home/loganw/cisticola/cisticola/scraper/odysee.py", line 36, in get_posts
for video in all_videos:
│ └ <generator object OdyseeChannel.get_all_videos.<locals>.<genexpr> at 0x7feb550c6d60>
└ <polyphemus.base.OdyseeVideo object at 0x7feb545f8cd0>
File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/polyphemus/base.py", line 46, in <genexpr>
self.all_videos = (OdyseeVideo(video, self.auth_token) for video in all_video_info)
│ │ │ │ │ │ └ {'address': 'bS6Dhx7jbei86wkqAsvMdWsY9mXbNrZX57', 'amount': '5.0', 'canonical_url': "lbry://@MissRedPill#e/Reonse-de-l'inspec...
│ │ │ │ │ └ 'H2KXtA5CAiC6V8bz5zkfeTEzs4T1AqGT'
│ │ │ │ └ <polyphemus.base.OdyseeChannel object at 0x7feb54a41fa0>
│ │ │ └ {'address': 'bS6Dhx7jbei86wkqAsvMdWsY9mXbNrZX57', 'amount': '5.0', 'canonical_url': "lbry://@MissRedPill#e/Reonse-de-l'inspec...
│ │ └ <class 'polyphemus.base.OdyseeVideo'>
│ └ <generator object OdyseeChannel.get_all_videos.<locals>.<genexpr> at 0x7feb550c6d60>
└ <polyphemus.base.OdyseeChannel object at 0x7feb54a41fa0>
File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/polyphemus/base.py", line 99, in __init__
raise KeyError(f'nether `video`, `audio`, nor `claim_hash` keys are in `full_video_info["value"]`, only {full_video_info["value"].keys()}')
KeyError: 'nether `video`, `audio`, nor `claim_hash` keys are in `full_video_info["value"]`, only dict_keys([\'description\', \'languages\', \'license\', \'release_time\', \'source\', \'stream_type\', \'tags\', \'thumbnail\', \'title\'])'
2022-04-03 10:37:21.679 | INFO | cisticola.scraper.base:scrape_channels:409 - OdyseeScraper 0.0.1 found 111 new posts from Channel(name='Miss Red Pill', platform_id=None, category='explicit_qanon', platform='Odysee', url='https://odysee.com/@MissRedPill:e', screenname='@MissRedPill', country='FR', influencer='Miss Red Pill', public=True, chat=False, notes='Mainly translating US video in french - 1st circle of Qanon', source='researcher')
2022-04-03 10:29:22.310 | DEBUG | cisticola.scraper.base:scrape_channels:384 - OdyseeScraper 0.0.1 is handling Channel(name='Qactus', platform_id=None, category='explicit_qanon', platform='Odysee', url='https://odysee.com/@Qactus:e', screenname='@Qactus', country='FR', influencer=None, public=True, chat=False, notes=None, source='researcher')
2022-04-03 10:29:23.160 | ERROR | cisticola.scraper.base:scrape_channels:400 - An error has been caught in function 'scrape_channels', process 'MainProcess' (70483), thread 'MainThread' (140648972392256):
Traceback (most recent call last):
File "/home/loganw/cisticola/app.py", line 134, in <module>
scrape_channels(args)
│ └ Namespace(command='scrape-channels', gsheet=None, media=False)
└ <function scrape_channels at 0x7feb56dd2af0>
File "/home/loganw/cisticola/app.py", line 98, in scrape_channels
controller.scrape_all_channels(archive_media = args.media)
│ │ │ └ False
│ │ └ Namespace(command='scrape-channels', gsheet=None, media=False)
│ └ <function ScraperController.scrape_all_channels at 0x7feb58a7c280>
└ <cisticola.scraper.base.ScraperController object at 0x7feb56da7be0>
File "/home/loganw/cisticola/cisticola/scraper/base.py", line 334, in scrape_all_channels
return self.scrape_channels(channels, archive_media=archive_media)
│ │ │ └ False
│ │ └ [Channel(name='Qanonfighters', platform_id='1412770923', category='qanon', platform='Telegram', url='https://ttttt.me/qanonfi...
│ └ <function ScraperController.scrape_channels at 0x7feb58a7c3a0>
└ <cisticola.scraper.base.ScraperController object at 0x7feb56da7be0>
> File "/home/loganw/cisticola/cisticola/scraper/base.py", line 400, in scrape_channels
for post in posts:
│ └ <generator object OdyseeScraper.get_posts at 0x7feb54fcde40>
└ ScraperResult(scraper='TelegramTelethonScraper 0.0.1', platform='Telegram', channel=109, platform_id='https://t.me/AntiMacron...
File "/home/loganw/cisticola/cisticola/scraper/odysee.py", line 36, in get_posts
for video in all_videos:
└ <generator object OdyseeChannel.get_all_videos.<locals>.<genexpr> at 0x7feb53f23190>
File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/polyphemus/base.py", line 46, in <genexpr>
self.all_videos = (OdyseeVideo(video, self.auth_token) for video in all_video_info)
│ │ │ │ │ │ └ {'address': 'bT99TGno6TCQeoCKYpgQuRwFeUZokhow2V', 'amount': '1.0', 'canonical_url': 'lbry://@Qactus#e/PourlireQactuscommentco...
│ │ │ │ │ └ 'H2KXtA5CAiC6V8bz5zkfeTEzs4T1AqGT'
│ │ │ │ └ <polyphemus.base.OdyseeChannel object at 0x7feb53f2d7f0>
│ │ │ └ {'address': 'bT99TGno6TCQeoCKYpgQuRwFeUZokhow2V', 'amount': '1.0', 'canonical_url': 'lbry://@Qactus#e/PourlireQactuscommentco...
│ │ └ <class 'polyphemus.base.OdyseeVideo'>
│ └ <generator object OdyseeChannel.get_all_videos.<locals>.<genexpr> at 0x7feb53f23190>
└ <polyphemus.base.OdyseeChannel object at 0x7feb53f2d7f0>
File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/polyphemus/base.py", line 99, in __init__
raise KeyError(f'nether `video`, `audio`, nor `claim_hash` keys are in `full_video_info["value"]`, only {full_video_info["value"].keys()}')
KeyError: 'nether `video`, `audio`, nor `claim_hash` keys are in `full_video_info["value"]`, only dict_keys([\'languages\', \'license\', \'release_time\', \'source\', \'stream_type\', \'tags\', \'thumbnail\', \'title\'])'
2022-04-03 10:29:23.181 | INFO | cisticola.scraper.base:scrape_channels:409 - OdyseeScraper 0.0.1 found 0 new posts from Channel(name='Qactus', platform_id=None, category='explicit_qanon', platform='Odysee', url='https://odysee.com/@Qactus:e', screenname='@Qactus', country='FR', influencer=None, public=True, chat=False, notes=None, source='researcher')
a third example:
2022-04-03 10:59:02.267 | DEBUG | cisticola.scraper.base:scrape_channels:384 - OdyseeScraper 0.0.1 is handling Channel(name='Freie Meinung ohne Zensur', platform_id=None, category='qanon_themes', platform='Odysee', url='https://odysee.com/@freiemeinung:e', screenname='@freiemeinung', country='DE', influencer=None, public=True, chat=False, notes=None, source='researcher')
2022-04-03 10:59:30.175 | ERROR | cisticola.scraper.base:scrape_channels:400 - An error has been caught in function 'scrape_channels', process 'MainProcess' (70483), thread 'MainThread' (140648972392256):
Traceback (most recent call last):
File "/home/loganw/cisticola/app.py", line 134, in <module>
scrape_channels(args)
│ └ Namespace(command='scrape-channels', gsheet=None, media=False)
└ <function scrape_channels at 0x7feb56dd2af0>
File "/home/loganw/cisticola/app.py", line 98, in scrape_channels
controller.scrape_all_channels(archive_media = args.media)
│ │ │ └ False
│ │ └ Namespace(command='scrape-channels', gsheet=None, media=False)
│ └ <function ScraperController.scrape_all_channels at 0x7feb58a7c280>
└ <cisticola.scraper.base.ScraperController object at 0x7feb56da7be0>
File "/home/loganw/cisticola/cisticola/scraper/base.py", line 334, in scrape_all_channels
return self.scrape_channels(channels, archive_media=archive_media)
│ │ │ └ False
│ │ └ [Channel(name='Qanonfighters', platform_id='1412770923', category='qanon', platform='Telegram', url='https://ttttt.me/qanonfi...
│ └ <function ScraperController.scrape_channels at 0x7feb58a7c3a0>
└ <cisticola.scraper.base.ScraperController object at 0x7feb56da7be0>
> File "/home/loganw/cisticola/cisticola/scraper/base.py", line 400, in scrape_channels
for post in posts:
│ └ <generator object OdyseeScraper.get_posts at 0x7feb53f23a50>
└ ScraperResult(scraper='OdyseeScraper 0.0.1', platform='Odysee', channel=541, platform_id='46c735e1a6bd72800f9e342587f350cd89e...
File "/home/loganw/cisticola/cisticola/scraper/odysee.py", line 36, in get_posts
for video in all_videos:
│ └ <generator object OdyseeChannel.get_all_videos.<locals>.<genexpr> at 0x7feb557f6b30>
└ <polyphemus.base.OdyseeVideo object at 0x7feb556d4e20>
File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/polyphemus/base.py", line 46, in <genexpr>
self.all_videos = (OdyseeVideo(video, self.auth_token) for video in all_video_info)
│ │ │ │ │ │ └ {'address': 'bJpTKbuYN5oSNvtc1Q7KwJtZ6XhtA797wD', 'amount': '0.1', 'canonical_url': 'lbry://@freiemeinung#e/Livestream-über-a...
│ │ │ │ │ └ 'H2KXtA5CAiC6V8bz5zkfeTEzs4T1AqGT'
│ │ │ │ └ <polyphemus.base.OdyseeChannel object at 0x7feb556d4280>
│ │ │ └ {'address': 'bJpTKbuYN5oSNvtc1Q7KwJtZ6XhtA797wD', 'amount': '0.1', 'canonical_url': 'lbry://@freiemeinung#e/Livestream-über-a...
│ │ └ <class 'polyphemus.base.OdyseeVideo'>
│ └ <generator object OdyseeChannel.get_all_videos.<locals>.<genexpr> at 0x7feb557f6b30>
└ <polyphemus.base.OdyseeChannel object at 0x7feb556d4280>
File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/polyphemus/base.py", line 99, in __init__
raise KeyError(f'nether `video`, `audio`, nor `claim_hash` keys are in `full_video_info["value"]`, only {full_video_info["value"].keys()}')
KeyError: 'nether `video`, `audio`, nor `claim_hash` keys are in `full_video_info["value"]`, only dict_keys([\'description\', \'languages\', \'license\', \'release_time\', \'tags\', \'thumbnail\', \'title\'])'
2022-04-03 10:59:30.210 | INFO | cisticola.scraper.base:scrape_channels:409 - OdyseeScraper 0.0.1 found 34 new posts from Channel(name='Freie Meinung ohne Zensur', platform_id=None, category='qanon_themes', platform='Odysee', url='https://odysee.com/@freiemeinung:e', screenname='@freiemeinung', country='DE', influencer=None, public=True, chat=False, notes=None, source='researcher')```
Oh, here's a different one:
2022-04-03 11:03:56.477 | DEBUG | cisticola.scraper.base:scrape_channels:384 - OdyseeScraper 0.0.1 is handling Channel(name='Wissen ist Macht THX', platform_id=None, category='qanon_themes', platform='Odysee', url='https://odysee.com/@WissenistMacht:b', screenname='@WissenistMacht', country='DE', influencer=None, public=True, chat=False, notes='Shares videos', source='researcher')
2022-04-03 11:05:15.464 | ERROR | cisticola.scraper.base:scrape_channels:400 - An error has been caught in function 'scrape_channels', process 'MainProcess' (70483), thread 'MainThread' (140648972392256):
Traceback (most recent call last):
File "/home/loganw/cisticola/app.py", line 134, in <module>
scrape_channels(args)
│ └ Namespace(command='scrape-channels', gsheet=None, media=False)
└ <function scrape_channels at 0x7feb56dd2af0>
File "/home/loganw/cisticola/app.py", line 98, in scrape_channels
controller.scrape_all_channels(archive_media = args.media)
│ │ │ └ False
│ │ └ Namespace(command='scrape-channels', gsheet=None, media=False)
│ └ <function ScraperController.scrape_all_channels at 0x7feb58a7c280>
└ <cisticola.scraper.base.ScraperController object at 0x7feb56da7be0>
File "/home/loganw/cisticola/cisticola/scraper/base.py", line 334, in scrape_all_channels
return self.scrape_channels(channels, archive_media=archive_media)
│ │ │ └ False
│ │ └ [Channel(name='Qanonfighters', platform_id='1412770923', category='qanon', platform='Telegram', url='https://ttttt.me/qanonfi...
│ └ <function ScraperController.scrape_channels at 0x7feb58a7c3a0>
└ <cisticola.scraper.base.ScraperController object at 0x7feb56da7be0>
> File "/home/loganw/cisticola/cisticola/scraper/base.py", line 400, in scrape_channels
for post in posts:
│ └ <generator object OdyseeScraper.get_posts at 0x7feb5559d4a0>
└ ScraperResult(scraper='OdyseeScraper 0.0.1', platform='Odysee', channel=546, platform_id='26c36470a583a02b5355236f2e390c74574...
File "/home/loganw/cisticola/cisticola/scraper/odysee.py", line 36, in get_posts
for video in all_videos:
│ └ <generator object OdyseeChannel.get_all_videos.<locals>.<genexpr> at 0x7feb5408d900>
└ <polyphemus.base.OdyseeVideo object at 0x7feb556d4520>
File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/polyphemus/base.py", line 46, in <genexpr>
self.all_videos = (OdyseeVideo(video, self.auth_token) for video in all_video_info)
│ │ │ │ │ │ └ {'address': 'bUamXduiKsc83bybv8FWR9wQ4rMot4KHH5', 'amount': '0.001', 'canonical_url': 'lbry://@WissenistMacht#b/fluffy-deep-s...
│ │ │ │ │ └ 'H2KXtA5CAiC6V8bz5zkfeTEzs4T1AqGT'
│ │ │ │ └ <polyphemus.base.OdyseeChannel object at 0x7feb556d48e0>
│ │ │ └ {'address': 'bUamXduiKsc83bybv8FWR9wQ4rMot4KHH5', 'amount': '0.001', 'canonical_url': 'lbry://@WissenistMacht#b/fluffy-deep-s...
│ │ └ <class 'polyphemus.base.OdyseeVideo'>
│ └ <generator object OdyseeChannel.get_all_videos.<locals>.<genexpr> at 0x7feb5408d900>
└ <polyphemus.base.OdyseeChannel object at 0x7feb556d48e0>
File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/polyphemus/base.py", line 144, in __init__
self.info['likes'], self.info['dislikes'] = api.get_video_reactions(
│ │ │ │ │ └ <function get_video_reactions at 0x7feb5791d280>
│ │ │ │ └ <module 'polyphemus.api' from '/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/polyphemu...
│ │ │ └ {'canonical_url': 'lbry://@WissenistMacht#b/fluffy-deep-sparrow#4', 'type': 'repost', 'channel_id': 'b43c5b81bdf754efa0387603...
│ │ └ <polyphemus.base.OdyseeVideo object at 0x7feb556d4f10>
│ └ {'canonical_url': 'lbry://@WissenistMacht#b/fluffy-deep-sparrow#4', 'type': 'repost', 'channel_id': 'b43c5b81bdf754efa0387603...
└ <polyphemus.base.OdyseeVideo object at 0x7feb556d4f10>
File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/polyphemus/api.py", line 210, in get_video_reactions
response = make_request(
└ <function make_request at 0x7feb5791aee0>
File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/polyphemus/api.py", line 44, in make_request
raise ValueError(msg)
└ "Maximum number of retries reached for request <function post at 0x7feb625e81f0> with kwargs {'url': 'https://api.odysee.com/...
ValueError: Maximum number of retries reached for request <function post at 0x7feb625e81f0> with kwargs {'url': 'https://api.odysee.com/reaction/list', 'data': {'auth_token': 'H2KXtA5CAiC6V8bz5zkfeTEzs4T1AqGT', 'claim_ids': '91b45642a7fa16c53bcb49b0f4c279c264d201c6'}}: status code 400
2022-04-03 11:05:15.494 | INFO | cisticola.scraper.base:scrape_channels:409 - OdyseeScraper 0.0.1 found 141 new posts from Channel(name='Wissen ist Macht THX', platform_id=None, category='qanon_themes', platform='Odysee', url='https://odysee.com/@WissenistMacht:b', screenname='@WissenistMacht', country='DE', influencer=None, public=True, chat=False, notes='Shares videos', source='researcher')
Added general case for handling additional media file types in this commit (https://github.com/bellingcat/polyphemus/commit/3fd841f76a550621f62f928b9c05fb09e5ae1cd8), which should deal with all of the errors posted here except the last one. Currently scraping the channels from the errors posted above to verify the changes fix the issue.
It didn't successfully scrape all of the results.