fake-name / xA-Scraper

68 stars 8 forks source link

Error Scraping Patreon #110

Open rebeltaz opened 2 years ago

rebeltaz commented 2 years ago

I am trying to get this to scrape patreon, but every time it runs the scheduled scrape, I get this error:

Main.Runtime - INFO - Scheduler executing class: <class 'xascraper.modules.patreon.patreonScrape.GetPatreon'>
ScraperBase Init
Starting up
Main.WebRequest - INFO - Using global chromium tab pool
Main.WebRequest - INFO - User agent overridden!
Starting up?
Main.WebRequest - INFO - Using global chromium tab pool
apscheduler.executors.default - ERROR - Job "pat (trigger: interval[0:05:00], next run at: 2022-01-22 01:30:00 CST)" raised an exception
Traceback (most recent call last):
  File "/home/bob/xA-Scraper/venv/lib/python3.6/site-packages/apscheduler/executors/base.py", line 125, in run_job
    retval = job.func(*job.args, **job.kwargs)
  File "./main_scrape.py", line 37, in runScraper
    instance = scraper_class()
  File "/home/bob/xA-Scraper/xascraper/modules/patreon/patreonScrape.py", line 62, in __init__
    'api_key': settings["captcha"]["anti-captcha"]['api_key'],
KeyError: 'anti-captcha'
Main.Runtime - INFO - Job crashed: 1e2773451533411e98cfafc059f03fe0
Main.Runtime - INFO - Traceback:   File "/home/derek/xA-Scraper/venv/lib/python3.6/site-packages/apscheduler/executors/base.py", line 125, in run_job
    retval = job.func(*job.args, **job.kwargs)
  File "./main_scrape.py", line 37, in runScraper
    instance = scraper_class()
  File "/home/bob/xA-Scraper/xascraper/modules/patreon/patreonScrape.py", line 62, in __init__
    'api_key': settings["captcha"]["anti-captcha"]['api_key'],

Any idea how to fix that? I am running ubuntu 20.04. Also... is there any way to force the scraper to run without having to set the timer to a low refresh? I had to set it to five minutes to get it to run again so I could get that error copied. Thanks.

fake-name commented 2 years ago

Also... is there any way to force the scraper to run without having to set the timer to a low refresh?

python3 -m manage run pat?

Did you delete the relevant line from the example config?

You don't need a valid key at the moment (the actual codepath that uses it is stubbed), but patreon sometimes hits you with a recaptcha, for which I use anti-captcha.com to deal with elsewhere.

The patreon scraper is fairly finicky. It REQUIRES being run in full desktop environment, and having the google-chrome chromium binary present. Running chromium in a full desktop session works around some of the weird client sniffing garbage webshit assholes do these days.

rebeltaz commented 2 years ago

Did you delete the relevant line from the example config?

I didn't delete it, but I did comment that part out. The error I copied and pasted was after commenting that out.

rebeltaz commented 2 years ago

python3 -m manage run pat?

Oh, by the way... when I run that, I get:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/bob/xA-Scraper/manage/__main__.py", line 15, in <module>
    from . import name_importer
  File "/home/bob/xA-Scraper/manage/name_importer.py", line 6, in <module>
    import psycopg2
ModuleNotFoundError: No module named 'psycopg2'
fake-name commented 2 years ago

Huh. Did you not install everything in requirements.txt? psycopg2-binary should provide the psycopg2 package, even if it's not really used if you're using sqlite.