fake-name / xA-Scraper

69 stars 8 forks source link

PatreonGet - stuck in loop - Truncating file length to x #92

Closed woebbi closed 4 years ago

woebbi commented 4 years ago

Hey, the patreon Getter seems to randomly get stuck at some point and produce this line Main.PatreonGet.StatusMgr - WARNING - Truncating file length to 29 characters and re-encoding.

and this continues to decease to infinity

Main.PatreonGet.StatusMgr - WARNING - Truncating file length to 2 characters and re-encoding. Main.PatreonGet.StatusMgr - WARNING - Truncating file length to 1 characters and re-encoding. Main.PatreonGet.StatusMgr - WARNING - Truncating file length to 0 characters and re-encoding. Main.PatreonGet.StatusMgr - WARNING - Truncating file length to -1 characters and re-encoding. Main.PatreonGet.StatusMgr - WARNING - Truncating file length to -2 characters and re-encoding. [...] Main.PatreonGet.StatusMgr - WARNING - Truncating file length to -8880539 characters and re-encoding. Main.PatreonGet.StatusMgr - WARNING - Truncating file length to -8880540 characters and re-encoding. Main.PatreonGet.StatusMgr - WARNING - Truncating file length to -8880541 characters and re-encoding. Main.PatreonGet.StatusMgr - WARNING - Truncating file length to -8880542 characters and re-encoding. Main.PatreonGet.StatusMgr - WARNING - Truncating file length to -8880543 characters and re-encoding.

It now seems to be a regular occurrence. the scraper seems to run fine for a while. I started to log the the last 3 runs of the script: the first time the script scraped or checked 1087 files then 105 files and then 406 files before getting stuck.

It also seems to be in direct/some relation the filename length:

Main.PatreonGet.StatusMgr - INFO - Complete filepath: /home/rippchen/downlcont/Patreon/Meesh/33476740-StakeoutBJ_wip5.png Main.PatreonGet.StatusMgr - WARNING - Truncating file length to 23 characters and re-encoding.`

Main.PatreonGet.StatusMgr - INFO - Complete filepath: /home/rippchen/downlcont/Patreon/Meesh/33476740-StakeoutBJ_wip5.png Main.PatreonGet.StatusMgr - WARNING - Truncating file length to 23 characters and re-encoding.

Main.PatreonGet.StatusMgr - INFO - Complete filepath: /home/rippchen/downlcont/Patreon/Meesh/33715455-PassingLove2_1_refine.png Main.PatreonGet.StatusMgr - WARNING - Truncating file length to 29 characters and re-encoding.

also It happend twice on the same file. edit: running linux debian 5.3.0 64bit python 3.6.8

fake-name commented 4 years ago

Ooooh, that's interesting. What platform are you running on?

Basically, the underlying code here is trying to deal with path length issues. The script tries to open a file to write to, and if that fails, it assumes (incorrectly here) that it's because the file length was too long, so it truncates the filename by one character, and retries.

I'd expect this to potentially encounter issues on windows, as I don't think I'm sanitizing the filename for all the disallowed characters on windows.

woebbi commented 4 years ago

I'm running on debian bullseye 64bit 5.3.0 running ext4 here a full uname

Linux sheer-server 5.3.0-3-amd64 #1 SMP Debian 5.3.15-1 (2019-12-07) x86_64 GNU/Linux

and running python 3.6.8

full path is /home/rippchen/downlcont/Patreon/Meesh

and the filenames that I caught were

33715455-PassingLove2_1_refine.png

33476740-StakeoutBJ_wip5.png

woebbi commented 4 years ago

Oh my god, how did that happen.... apparently some permissions were wrong facepalms

everything works fine sorry for the confusion ....

fake-name commented 4 years ago

Heh, I should still not let the truncation go negative.

fake-name commented 4 years ago

Ok, if this reoccurs it should now abort in a sane manner, rather then just getting stuck forever.