Cyberes / vitalsource2pdf

Ultra-high quality PDFs from VitalSource.
GNU Affero General Public License v3.0
56 stars 13 forks source link

Doesn't work anymore #11

Open StrikeSNC opened 10 months ago

StrikeSNC commented 10 months ago

Despite #10 and #5, even if you manage to manually set up the page link and total pages, the scrapper will fail eventually due to captcha block (whether scrapping the download link or downloading the images). This is extrememly frustrating (my book has 600 pages and it will take forever for me to manually scrap them), plus nowadays VitalSource seems to block you out while you encounter too many captcha checks (it will get you into a login loop as one of the scripts are having HTTP 401)

Is it possible to edit the logic, where the scrapping pages and downloading image procedue happens together? (say, scrap the image link and download it first before moving onto next page), so at least it'd be possible to get the first/second part of the book instead of waiting up to an hour for page scrapping and being interrupted on the image download part, resulting in bunch of unorganized image files in the ouput folder.

Cyberes commented 10 months ago

Yes, it's definitely possible. The reason I did it separately was because that was the simpler approach. To do them together you need a MITM proxy which I tried but didn't finish because doing it separately was simpler (and worked when I initially wrote this).

Also, its hilarious that VitalSource has added the captcha check. This company is so user-hostile.