Open Erik262 opened 2 years ago
Using this repo's latest code, that is? Maybe there's geoblocking at play? It works on GitHub Actions (→ https://github.com/NicoWeio/blinkist/runs/6895445085) as well as my machine, so there isn't much I can do about it. You could try logging the response text – maybe it tells you what happened.
I'm on the latest one I pulled about 15 min ago: The complete Error: Traceback (most recent call last): File "/Users/erik/Library/Python/3.9/lib/python/site-packages/requests/models.py", line 972, in json return complexjson.loads(self.text, **kwargs) File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/init.py", line 346, in loads return _default_decoder.decode(s) File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/erik/Downloads/blinkist-main/main.py", line 66, in
Well, you don't get a JSON response. Try logging response.text
instead of response.json()
.
Okay. I tried to validate the response.text string and figured out that there is an error in the json file: But I don't know how to fix this, since the response call function should to the trick, but is somehow broken.
This is what I get back as response.text
{"book":{"id":"628df2186cee0700084919a6","kind":"book","slug":"goals-based-investing-en","title":"Goals-based Investing","subtitle":"A Visionary Framework for Wealth Management","subtitleHtmlSafe":"A Visionary Framework for Wealth Management","aboutTheBook":"\u003cp\u003e\u003cem\u003eGoals-Based Investing \u003c/em\u003e(2022) explains how the wealth management industry is transforming, how modern portfolio theory is no longer considered modern, and how product evolution and regulatory changes are making it easier for investors and advisors to access market segments that were once the exclusive domain of large institutes.\u003c/p\u003e","buyOnAmazonUrl":"/en/books/goals-based-investing-en/purchase","author":"Tony Davidow","truncatedAuthor":"Tony Davidow","sourceAuthor":"Tony Davidow","u rl":"https://www.blinkist.com/en/books/goals-based-investing-en","browseUrl":"/en/nc/browse/books/goals-based-investing-en","previewUrl":"/en/books/goals-based-investing-en","read ingDuration":25,"minutesToRead":25,"isAudio":true,"readCount":null,"image":{"default":{"src":"https://images.blinkist.io/images/books/628df2186cee0700084919a6/1_1/470.jpg","srcset ":{"2x":"https://images.blinkist.io/images/books/628df2186cee0700084919a6/1_1/640.jpg"}},"sources":[{"media":"xs","src":"https://images.blinkist.io/images/books/628df2186cee070008 4919a6/1_1/470.jpg","srcset":{"2x":"https://images.blinkist.io/images/books/628df2186cee0700084919a6/1_1/640.jpg"}},{"media":"s","src":"https://images.blinkist.io/images/books/628 df2186cee0700084919a6/1_1/640.jpg","srcset":{"2x":"https://images.blinkist.io/images/books/628df2186cee0700084919a6/1_1/1080.jpg"}},{"media":"m","src":"https://images.blinkist.io/ images/books/628df2186cee0700084919a6/1_1/250.jpg","srcset":{"2x":"https://images.blinkist.io/images/books/628df2186cee0700084919a6/1_1/470.jpg"}}]},"audioUrl":"https://hls.blinki st.io/bibs/628df2186cee0700084919a6/628df2196cee0700084919a8-T1653981795.m4a?Expires=1655355345\u0026Signature=G7saoNJx1hZYFnZaj~X2dE0tyJAGg4GUEDTawW4Nuh13qiuHy6maJGjk1agHKo2p9qt7 erLaSOncPXzVErJa2tenzR7qokLK~LZf9QEaRr5bLagkkSAK8SI9TpDw9R6yP6luOlOKzhXO~orkpPzH9Xui5VkOcB5j9VmkxC-pGxEkVoGwOE~ArQuCHNvoyFFLsaadSAKAV2nQV2Jf~280yqO0I7a-rgOwlATQznsB301gQPP9CT56fun nb1GNjCu3cspv~nLcgYUrQkZyT2o72-lOdG8ssl2D1YOcPt0bYNwMnnCvyr99pMor4reyaX2RKF41n-VAf8p2Tu~ZUzkFDA__\u0026Key-Pair-Id=APKAJXJM6BB7FFZXUB4A","chaptersLength":7,"hasAudio":true,"langua ge":"en","freeDaily":null,"category":{"title":"Money \u0026 Investments","sprite":"money-and-investments","slug":"money-and-investments-en"},"averageRating":3.6,"categories":[{"id ":"54788fef6439320008240000","url":"/en/nc/categories/money-and-investments-en","sprite":"money-and-investments","slug":"money-and-investments-en","title":"Money \u0026 Investments","subtitle":"You work hard for your money, right? Let the experts show you how to make it work hard for you."}]},"endTimestamp":1655416799}
That's curious. I don't see an error in the JSON you posted, that is, https://jsonformatter.curiousconcept.com/ doesn't report one. I assume the spaces in e.g. "u rl"
were a result of copy-pasting the data? Try using triple backticks for that.
Other than that, my best guess is that the latter response is actually fine, and you just had bad luck. I'll implement better error handling, so we can see what's going on.
Turns out this is Cloudflare. I assumed that cloudscraper would raise CloudflareChallengeError
automatically, but that's not the case. In @ptrstn's version, this is done in _get_daily_blink_info(self, language="en")
.
I'll add some retry magic.
Alright then, please try again with the latest main.py
(notice the new requirement tenacity
). :)
@NicoWeio Tried and then this error came up: It seems to work for the first second (could see the downloading bar), then when it started downloading audio files it stopped working and since then I don't see the download bar anymore.
`Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 407, in __call__
result = fn(*args, **kwargs)
File "/Users/erik/Downloads/blinkist-main/main.py", line 38, in _api_request
raise cloudscraper.exceptions.CloudflareChallengeError()
cloudscraper.exceptions.CloudflareChallengeError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/erik/Downloads/blinkist-main/main.py", line 77, in <module>
free_daily = get_free_daily(locale=locale)
File "/Users/erik/Downloads/blinkist-main/main.py", line 50, in get_free_daily
return _api_request('free_daily', params={'locale': locale})
File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 324, in wrapped_f
return self(f, *args, **kw)
File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 404, in __call__
do = self.iter(retry_state=retry_state)
File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 361, in iter
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x101ed7cd0 state=finished raised CloudflareChallengeError>]`
BUT wait!! after doing some "definition of insanity" it started working just by running it again a few times even when error come up. Interessting.
Thanks for your feedback! I actually forgot adding the retry logic to audio downloads. That should be fixed now, so you don't have to “definition of insanity” yourself. ;)
Hey there, does this kind of error still occur with the latest version of my code?
yes, then I gave up testing xD
In my testing, mostly your code works great and I am grateful, but for some reason this particular book throws the same error:
ubuntu:~/blinkist# ./main.py --book-slug the-7-habits-of-highly-effective-people-en ./test/
Book (1/1): “The 7 Habits of Highly Effective People”
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--Error downloading „The 7 Habits of Highly Effective People“ – renaming output directory.
Traceback (most recent call last):
File "/home/ubuntu/blinkist/lib/python3.10/site-packages/tenacity/__init__.py", line 407, in __call__
result = fn(*args, **kwargs)
File "/home/ubuntu/blinkist/blinkist/common.py", line 27, in request
raise cloudscraper.exceptions.CloudflareChallengeError()
cloudscraper.exceptions.CloudflareChallengeError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ubuntu/blinkist/./main.py", line 132, in <module>
main()
File "/home/ubuntu/blinkist/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/ubuntu/blinkist/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/ubuntu/blinkist/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ubuntu/blinkist/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/ubuntu/blinkist/./main.py", line 124, in main
download_book(
File "/home/ubuntu/blinkist/./main.py", line 43, in download_book
_ = book.chapters
File "/usr/lib/python3.10/functools.py", line 981, in __get__
val = self.func(instance)
File "/home/ubuntu/blinkist/blinkist/book.py", line 54, in chapters
chapters = [
File "/home/ubuntu/blinkist/blinkist/book.py", line 55, in <listcomp>
Chapter.from_id(self, chapter['id'])
File "/home/ubuntu/blinkist/blinkist/chapter.py", line 16, in from_id
chapter_data = api_request_web(f'books/{book.id}/chapters/{chapter_id}')
File "/home/ubuntu/blinkist/blinkist/common.py", line 49, in api_request_web
return api_request('https://blinkist.com/api/', endpoint, params=params)
File "/home/ubuntu/blinkist/blinkist/common.py", line 40, in api_request
response = request(url, params=params, headers=HEADERS)
File "/home/ubuntu/blinkist/lib/python3.10/site-packages/tenacity/__init__.py", line 324, in wrapped_f
return self(f, *args, **kw)
File "/home/ubuntu/blinkist/lib/python3.10/site-packages/tenacity/__init__.py", line 404, in __call__
do = self.iter(retry_state=retry_state)
File "/home/ubuntu/blinkist/lib/python3.10/site-packages/tenacity/__init__.py", line 361, in iter
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f7cd59652a0 state=finished raised CloudflareChallengeError>]
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--
Although I have no fix yet, I found some more books that reliably trigger 403 errors:
Maybe there is a pattern?
@Erik262 @NicoWeio I think I have identified some more giving cloudflare errors, if you would like to test?
the-leaders-guide-to-unconscious-bias-en the-8th-habit-en the-speed-of-trust-en building-a-second-brain-en everyone-deserves-a-great-manager-en first-things-first-en 121-first-dates-en a-beautiful-mind-en the-7-habits-of-highly-effective-people-en
Thanks, @2600box! I investigated this some more and found out two things:
In the web app, one can see that the request goes to the expected URL and works, contrary to a request to the same URL by this code. Compared to requests for a book that can be downloaded with this program, some request headers differ (direction of comparison: working → not working).
Sec-Fetch-Mode: no-cors
→ Sec-Fetch-Mode: cors
.Providing a valid _blinkist-webapp_session
cookie fixes our problems. I verified this for all of the links in your comment above.
If I don't find an alternative, I will add an option to provide this or to automatically extract it from Firefox in the near future.
Thanks for working on this. I tested the new branch with my cookies.sqlite and it worked well.
Ideally being able to specify cookies.txt file would be ideal.
I also noticed you added the "This book has no audio." which is great.
Thanks for continuing this project!
You're very welcome!
Can you elaborate on why a cookies.txt file would be helpful to you? Wouldn't auto-import from all major browsers (to be done) be more comfortable? Of course I could implement both, I just don't see the use case.
You're very welcome!
Can you elaborate on why a cookies.txt file would be helpful to you? Wouldn't auto-import from all major browsers (to be done) be more comfortable? Of course I could implement both, I just don't see the use case.
Sure. First, to me it is a more standard approach. Secondly, it is because I prefer to export the cookie for blinkest individually and third I don't run this on the same machine that has my browser.
I cherry-pick 95d9367e3670cef1d96fba9681804decc63ced98 and it works great. If you don't have time for a thorough fix I would recommend push 95d9367e3670cef1d96fba9681804decc63ced98 to master and tell folks to use Firefox to login first before using the tool.
That's a good idea. The only reason I didn't to it yet is because support for other browsers seemed so close… and then I never got around to it. As I just wrote in another issue, I hope to get back to this in a month or so.
I think most of the users are "life-hackers" anyway, they won't mind using Firefox just so that the tool works like a breeze :))
I'm getting this error message for example with this link here: https://www.blinkist.com/api/books/the-automation-advantage-en/chapters