Closed GeneralUltra758 closed 3 years ago
further debug info:
IPython 7.18.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: source /mnt/c/Users/GeneralUltra758/xA-Scraper/venv/bin/activate
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
/mnt/c/Users/GeneralUltra758/xA-Scraper/xascraper/modules/patreon/patreonScrape.py in get_api_json(self, endpoint, postData, retries)
272 post_data = postData,
--> 273 post_type='application/json'
274 )
/mnt/c/Users/GeneralUltra758/xA-Scraper/venv/lib/python3.7/site-packages/ChromeController/manager.py in xhr_fetch(self, url, headers, post_data, post_type)
303 ret = self.execute_javascript_function(js_script, [url, headers, post_data, post_type])
--> 304 ret = self._unpack_xhr_resp(ret)
305 return ret
/mnt/c/Users/GeneralUltra758/xA-Scraper/venv/lib/python3.7/site-packages/ChromeController/manager.py in _unpack_xhr_resp(self, values)
236 assert entry['name'] not in ret
--> 237 ret[entry['name']] = self.__decode_serialized_value(entry['value'])
238
/mnt/c/Users/GeneralUltra758/xA-Scraper/venv/lib/python3.7/site-packages/ChromeController/manager.py in __decode_serialized_value(self, value)
198 assert 'type' in value
--> 199 assert 'value' in value
200
AssertionError:
During handling of the above exception, another exception occurred:
UnrecoverableFailureException Traceback (most recent call last)
/mnt/c/Users/GeneralUltra758/xA-Scraper/xascraper/modules/patreon/patreonScrape.py in getNameList(self)
784
--> 785 artist_lut = self.get_artist_lut()
786 except Exception as e:
/mnt/c/Users/GeneralUltra758/xA-Scraper/xascraper/modules/patreon/patreonScrape.py in get_artist_lut(self)
767 def get_artist_lut(self):
--> 768 general_meta = self.current_user_info()
769 campaign_items = [item for item in general_meta['included'] if item['type'] == "campaign"]
/mnt/c/Users/GeneralUltra758/xA-Scraper/xascraper/modules/patreon/patreonScrape.py in current_user_info(self)
345 def current_user_info(self):
--> 346 current = self.get_api_json("/current_user?include=pledges&include=follows")
347 return current
/mnt/c/Users/GeneralUltra758/xA-Scraper/xascraper/modules/patreon/patreonScrape.py in get_api_json(self, endpoint, postData, retries)
278 traceback.print_exc()
--> 279 raise exceptions.UnrecoverableFailureException("Wat?")
280
UnrecoverableFailureException: Wat?
and set to only run the patreon scraper to run the scraper immediately after startup (could not see a way to do this natively.
python3 -m manage fetch pat
?
Note: the patreon scraper currently requires a full GUI session + headed chromium to work properly. Please feel free to complain to cloudflare if this is a problem.
i am aware that it uses full headed chrome. there is no issue logging in from seeing the chrome window being on the patreon home successfully logged in (using vcxsrv on WSL2)
python3 -m manage fetch pat
?
was not aware of that. thanks for the tip
i tried debugging it in VSCode with WSL to see exactly whats going wrong somehoe
post_type='application/json'
in patreon scrapers get_api_json
function now throws an assertion error which it has not before when testing it successfully the other day...
Check your dependencies are up to date (pip install --upgrade -r requirements.txt
).
I just checked locally, and https://github.com/fake-name/xA-Scraper/commit/7edb46a1e8f562c3631bc12b25c5eac7cb5aa56d was blowing up do to some recent changes upstream in my libraries. If you weren't getting crashes, you have at least one out of date library.
i have had commented out lines 60-72 to remove the requirement for a paid anticapcha since it requires a full headed chrome anyway.
ive now checked the stacktrace again and found the assert 'value' in value
that was causing the crash in line 199 of the module
xA-Scraper/venv/lib/python3.7/site-packages/ChromeController/manager.py
commenting this out seems to resolve the issue... but no idea what caused the issue in the first place.. reinstalling deps hasnt changed anything..
worthy of note: ive been changing some things around in the patreonScrape.py to implement an additional feature (for which ill make a PR later) but those changes tested OK just the other day (prior to me submitting the dependency issue which ive fixed locally).. now all of a sudden this assertion error happens for seemingly no reason
Correction to requirements update, pulled latest requirements.txt seems to have fixed it. no idea how i wasnt getting that error before with the last version of the requirements.txt + locally added newer webrequest version as per issue #101
You shouldn't need to bother with https://github.com/fake-name/xA-Scraper/issues/101 at this point, as I now depend on up-to-date (0.0.78) webrequest directly.
The errors in ChromeController are weird. In general, piloting remote chrome is kind of brittle, and can be sensitive to a bunch of other system crap (did you apt-get/yum/w-e update
? Or run chrome from google, and it updated itself?).
The web is actively becoming a shittier place. It's depressing.
i did install chrome via a .deb file.. could be that it updated itself.. reasons why i prefer js and puppeteer for web scraper stuff
I'd prefer people to not use JS (and google to not update multiple times a day), but you do what you can.
Basically, I depend on a number of components that need to be updated in lockstep, and chrome's fixation on self-updating is problematic. If you leave it alone it can get out of date and explode messily.
reasons why i prefer js and puppeteer for web scraper stuff
Unfortunately, I find JS itself to be a thoroughly unpleasant language to actually write stuff in.
ghetting the following when running the patreon scraper:
investigating rn to see if i can figure out a exact cause. note: Added the following to main_scrape.py @ line 56:
runScraper(scraperClass, managedNamespace)
and set to only run the patreon scraper to run the scraper immediately after startup (could not see a way to do this natively. maybe a good feature to add?)