Open ghost opened 1 year ago
Yeah, it's got some weird issues. If I have more vitalsource books next semester I'll definitely get this thing working again.
I want to scrape a book, too. So I stumbled across this neat little tool and the given issues. I'm not really a developer and have no idea about selenium. What I noticed so far is that the scraping works when I debug the app and access the resources manually in the browser once. This works for the metadata and the images. Maybe a simple workaround is to automate this access?
access the resources manually
Damn, they're doing cookie shenanigans. That's time consuming to reverse-engineer and automate.
Yes, that's what I thought. However I could not figure out how to automate it by now. Maybe I will investigate somewhat further in the next days. I'll let you know when I find something useful.
The network request debug console is your friend. It takes a lot of tinkering and experimentation to determine how their backend works.
I have the default output folder specified (VitalSource/ISBN) and the ISBN folder is completely empty after running the script, logging in, and pressing enter. When I do login - in both chrome and chromium - and press enter, I'm automatically sent to the book corresponding to the ISBN I entered. However, no changes are made in the folder or browser following running
python vitalsource2pdf.py --isbn *ISBN* --chrome-exe /usr/bin/google-chrome-stable
. Could it be possible the specific ISBN/book on vitalsource is resistant to the script?System: Arch Linux I'm using a python venv and have ran both
pip install -r requirements.txt
andparu -S ocrmypdf jbig2dec
. Also, selenium boots into an insecurehttpsconnection and I've tried the above after both accepting and denying cookies from the site.I assume this is a problem with the script, however my experience with selenium has been limited and I'm unsure how to troubleshoot in order to move forward.