Cyberes / vitalsource2pdf

Ultra-high quality PDFs from VitalSource.
GNU Affero General Public License v3.0
69 stars 17 forks source link

Nothing Being downloaded to output folder #2

Open ghost opened 1 year ago

ghost commented 1 year ago

I have the default output folder specified (VitalSource/ISBN) and the ISBN folder is completely empty after running the script, logging in, and pressing enter. When I do login - in both chrome and chromium - and press enter, I'm automatically sent to the book corresponding to the ISBN I entered. However, no changes are made in the folder or browser following running python vitalsource2pdf.py --isbn *ISBN* --chrome-exe /usr/bin/google-chrome-stable. Could it be possible the specific ISBN/book on vitalsource is resistant to the script?

System: Arch Linux I'm using a python venv and have ran both pip install -r requirements.txt and paru -S ocrmypdf jbig2dec. Also, selenium boots into an insecure https connection and I've tried the above after both accepting and denying cookies from the site.

I assume this is a problem with the script, however my experience with selenium has been limited and I'm unsure how to troubleshoot in order to move forward.

Cyberes commented 1 year ago

Yeah, it's got some weird issues. If I have more vitalsource books next semester I'll definitely get this thing working again.

itsmalter commented 1 year ago

I want to scrape a book, too. So I stumbled across this neat little tool and the given issues. I'm not really a developer and have no idea about selenium. What I noticed so far is that the scraping works when I debug the app and access the resources manually in the browser once. This works for the metadata and the images. Maybe a simple workaround is to automate this access?

Cyberes commented 1 year ago

access the resources manually

Damn, they're doing cookie shenanigans. That's time consuming to reverse-engineer and automate.

itsmalter commented 1 year ago

Yes, that's what I thought. However I could not figure out how to automate it by now. Maybe I will investigate somewhat further in the next days. I'll let you know when I find something useful.

Cyberes commented 1 year ago

The network request debug console is your friend. It takes a lot of tinkering and experimentation to determine how their backend works.