Phoenix124 / scribd-downloader

310 stars 99 forks source link

KeyError: data-push_state #2

Open Saebre opened 4 years ago

Saebre commented 4 years ago

When trying to download audiobooks, I get a KeyError: 'data-push_state'. I cannot for the life of me figure out how to fix this. Any ideas?

File "C:\Users\thier\AppData\Local\Programs\Python\Python38\lib\site-packages\scribd_downloader-1.3.1-py3.8.egg\scribdl\content\audiobook.py", line 247, in _scrape_audiobook_page text = json.loads(div_tag["data-push_state"]) File "C:\Users\thier\AppData\Local\Programs\Python\Python38\lib\site-packages\beautifulsoup4-4.8.2-py3.8.egg\bs4\element.py", line 1321, in getitem return self.attrs[key] KeyError: 'data-push_state'

theothersophie commented 4 years ago

I'm pretty sure the KeyError is because the HTML markup of the page changed and the attribute data-push_state doesn't exist anymore. Someone with experience using beautifulsoup and requests would be able to figure out the new format. I don't but I figure out how to work around it instead.

Saebre commented 4 years ago

I don't but I figure out how to work around it instead.

Would you mind sharing how you worked around it?

theothersophie commented 4 years ago

It's a little complex and a ratchet workaround using the tools I'm familiar with. I'm sure the real solution is much simpler and we just don't have the know-how to fix it. Note that we need a Premium subscription in order to follow these steps.

First of all I have the development version of scribdl installed. Look at the readme to see installation process for that. I expand on how to install it in Issue #9

I'm using Postman (a program) to intercept the requests going through the browser. I turn on the Interceptor in Postman (google this if you don't know what it is) and in the History tab take note of the POST request to an URL that looks like this: "https://api.findawayworld.com/v4/audiobooks/12345/playlists" which should appear in History the moment you click on "Start Listening" on a book. Click on the request in the History tab and hit Send to see the response.

Click on Body under the url input to see something like {"license_id":"5efd04173f0f627901261adc"}

This is the payload for the request that we're going to need to successfully make the same request. We also need the correct headers. Click on "code" on the right edge in orange and select python -requests under Language to see the code needed to make the request. We're going to need the payload variable and the headers variable in this code snippet.

I made a new script, test_downloader.py under scribdl/content

test_downloader.txt

Replace playlist_url, headers, data, license_id, and sanitized_title using all the information from Postman.

You'll probably get a bunch of errors trying this. I basically just commented out all of the parts that were giving errors that weren't vital to the function of the script. You'll need something like VS Code to locate where the errors are. To run it just cd into the script's directory and "python test_downloader.py" in the terminal. I'm aware this is quite messy and inefficient, but the important thing is it worked for me.

Note: replace "{0}/{0}_{1}.mp3" with just "{0}_{1}.mp3" for testing purposes. I can't confirm if the former works.

farmerpaul commented 3 years ago

So, for anyone else with the patience and savvy to follow @theothersophie's thorough instructions (thanks, btw!), you may run into a few issues along the way. I thought I'd share what I did, and include my revised version of test_downloader.py.

First, these instructions assume you are running macOS, and I happen to be running Big Sur. For other OSs, things might turn out differently.

  1. I followed the same basic instructions for installing scribdl, development version, as per the readme – except they say to just run python setup.py install, while all the other commands use python3. For consistency and to make sure you're using version 3 of Python (which seems to be required), make sure to enter the command this way: python3 setup.py install
  2. I ran into an error while the above command was installing the dependency pikepdffatal error: 'qpdf/Constants.h' file not found. I Googled it and found out I should try upgrading pip3 (although it was already up to date), then running python3 -m pip install pikepdf, which I did. Then I re-ran python3 setup.py install and it seemed to finish fine.
  3. Download test_downloader.txt and save it as scribd-downloader-master/scribdl/test_downloader.py (not under the deeper subfolder content as instructed above).
  4. Follow @theothersophie's instructions for running Postman Interceptor to gather the required details of the HTTP request from Postman, and replace the sample values in test_downloader.py with the ones from Postman.
  5. From the scribd-downloader-master folder, run python3 -m scribdl.test_downloader.

Hope this works out for others!

jaan143 commented 3 years ago

@theothersophie i am trying to use your method sins 4 days and i installed postman app but i cant get data from chrome to postman evan on postman everything is correct which required and this type url is not showing in scribd https://api.findawayworld.com/v4/audiobooks/12345/playlists

i think website is updated can you check it please

thank you

Sophia314 commented 3 years ago

Can you help to get this method working? It looks like they have changed the website so much that the instructions are off. It doesn't even say playlist anymore but "https://dailyplanet.findawayworld.com/v1/events"

The book seems to have a number instead of title and no obvious license ID. There's a udid and a dailyplanet key that look similar, but have dashes. Thanks``