bellingcat / cisticola

Coordinates scrapers and interfaces with database
15 stars 0 forks source link

Gettr scraper crashes in certain cases #31

Closed loganwilliams closed 2 years ago

loganwilliams commented 2 years ago
2022-04-02 14:13:38.049 | ERROR    | cisticola.scraper.base:scrape_channels:398 - An error has been caught in function 'scrape_channels', process 'MainProcess' (63504), thread 'MainThread' (139778754606912):
Traceback (most recent call last):

  File "/home/loganw/cisticola/app.py", line 136, in <module>
    scrape_channels(args)
    │               └ Namespace(command='scrape-channels', gsheet=None, media=False)
    └ <function scrape_channels at 0x7f20b9d52a60>

  File "/home/loganw/cisticola/app.py", line 100, in scrape_channels
    controller.scrape_all_channels(archive_media = args.media)
    │          │                                   │    └ False
    │          │                                   └ Namespace(command='scrape-channels', gsheet=None, media=False)
    │          └ <function ScraperController.scrape_all_channels at 0x7f20bb9fd280>
    └ <cisticola.scraper.base.ScraperController object at 0x7f20b9d1bcd0>

  File "/home/loganw/cisticola/cisticola/scraper/base.py", line 332, in scrape_all_channels
    return self.scrape_channels(channels, archive_media=archive_media)
           │    │               │                       └ False
           │    │               └ [Channel(name='Qanonfighters', platform_id='1412770923', category='qanon', platform='Telegram', url='https://ttttt.me/qanonfi...
           │    └ <function ScraperController.scrape_channels at 0x7f20bb9fd3a0>
           └ <cisticola.scraper.base.ScraperController object at 0x7f20b9d1bcd0>

> File "/home/loganw/cisticola/cisticola/scraper/base.py", line 398, in scrape_channels
    for post in posts:
        │       └ <generator object GettrScraper.get_posts at 0x7f20b7656a50>
        └ ScraperResult(scraper='GettrScraper 0.0.1', platform='Gettr', channel=732, platform_id='pojee18a3c', date=datetime.datetime(2...

  File "/home/loganw/cisticola/cisticola/scraper/gettr.py", line 29, in get_posts
    for post in scraper:
                └ <generator object UserActivity.pull at 0x7f20b7656350>

  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/gogettr/capabilities/user_activity.py", line 38, in pull
    for data in self.client.get_paginated(
                │    │      └ <function ApiClient.get_paginated at 0x7f20ba9125e0>
                │    └ <gogettr.api.ApiClient object at 0x7f20b78b5ee0>
                └ <gogettr.capabilities.user_activity.UserActivity object at 0x7f20b78b5100>
  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/gogettr/api.py", line 96, in get_paginated
    data = self.get(*args, **kwargs)
           │    │    │       └ {'params': {'max': 20, 'dir': 'fwd', 'incl': 'posts|stats|userinfo|shared|liked', 'fp': 'f_uo', 'offset': 0}}
           │    │    └ ('/u/user/Carpesos/posts',)
           │    └ <function ApiClient.get at 0x7f20ba9124c0>
           └ <gogettr.api.ApiClient object at 0x7f20b78b5ee0>
  File "/home/loganw/.local/share/virtualenvs/cisticola-BRujq-3x/lib/python3.9/site-packages/gogettr/api.py", line 80, in get
    raise GettrApiError(errors[-1])  # Throw with most recent error
          │             └ [{'_t': 'xresp', 'rc': 'ERR', 'error': {'_t': 'xerr', 'code': 'E_SYS_DB_REQ', 'emsg': 'Error updating:', 'args': ['Carpesos']...
          └ <class 'gogettr.errors.GettrApiError'>

gogettr.errors.GettrApiError
2022-04-02 14:13:38.070 | INFO     | cisticola.scraper.base:scrape_channels:407 - GettrScraper 0.0.1 found 0 new posts from Channel(name='Pas De Mensonge', platform_id='Carpesos', category='qanon', platform='Gettr', url='https://gettr.com/user/Carpesos', screenname='Carpesos', country='FR', influencer=None, public=True, chat=False, notes=None, source='researcher')
trislee commented 2 years ago

Gettr's API apparently can't handle non-lowercase usernames. Ensured input usernames are lowercase in commit https://github.com/bellingcat/cisticola/commit/90c99aec0008c0a035b927259f504c46c1f5ecff