NicoWeio / blinkist

Tool to download Blinkist's free offerings, namely "Free Daily" and free curated lists
https://nicoweio.github.io/blinkist/
GNU Affero General Public License v3.0
52 stars 12 forks source link

HTTPError: 403 Client Error: Forbidden for url: XXX #1

Open Erik262 opened 2 years ago

Erik262 commented 2 years ago

I'm getting this error message for example with this link here: https://www.blinkist.com/api/books/the-automation-advantage-en/chapters

NicoWeio commented 2 years ago

Using this repo's latest code, that is? Maybe there's geoblocking at play? It works on GitHub Actions (→ https://github.com/NicoWeio/blinkist/runs/6895445085) as well as my machine, so there isn't much I can do about it. You could try logging the response text – maybe it tells you what happened.

Erik262 commented 2 years ago

I'm on the latest one I pulled about 15 min ago: The complete Error: Traceback (most recent call last): File "/Users/erik/Library/Python/3.9/lib/python/site-packages/requests/models.py", line 972, in json return complexjson.loads(self.text, **kwargs) File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/init.py", line 346, in loads return _default_decoder.decode(s) File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/erik/Downloads/blinkist-main/main.py", line 66, in free_daily = get_free_daily(locale=locale) File "/Users/erik/Downloads/blinkist-main/main.py", line 32, in get_free_daily return response.json() File "/Users/erik/Library/Python/3.9/lib/python/site-packages/requests/models.py", line 976, in json raise RequestsJSONDecodeError(e.msg, e.doc, e.pos) requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

NicoWeio commented 2 years ago

Well, you don't get a JSON response. Try logging response.text instead of response.json().

Erik262 commented 2 years ago

Okay. I tried to validate the response.text string and figured out that there is an error in the json file: But I don't know how to fix this, since the response call function should to the trick, but is somehow broken.

This is what I get back as response.text

{"book":{"id":"628df2186cee0700084919a6","kind":"book","slug":"goals-based-investing-en","title":"Goals-based Investing","subtitle":"A Visionary Framework for Wealth Management","subtitleHtmlSafe":"A Visionary Framework for Wealth Management","aboutTheBook":"\u003cp\u003e\u003cem\u003eGoals-Based Investing \u003c/em\u003e(2022) explains how the wealth management industry is transforming, how modern portfolio theory is no longer considered modern, and how product evolution and regulatory changes are making it easier for investors and advisors to access market segments that were once the exclusive domain of large institutes.\u003c/p\u003e","buyOnAmazonUrl":"/en/books/goals-based-investing-en/purchase","author":"Tony Davidow","truncatedAuthor":"Tony Davidow","sourceAuthor":"Tony Davidow","u rl":"https://www.blinkist.com/en/books/goals-based-investing-en","browseUrl":"/en/nc/browse/books/goals-based-investing-en","previewUrl":"/en/books/goals-based-investing-en","read ingDuration":25,"minutesToRead":25,"isAudio":true,"readCount":null,"image":{"default":{"src":"https://images.blinkist.io/images/books/628df2186cee0700084919a6/1_1/470.jpg","srcset ":{"2x":"https://images.blinkist.io/images/books/628df2186cee0700084919a6/1_1/640.jpg"}},"sources":[{"media":"xs","src":"https://images.blinkist.io/images/books/628df2186cee070008 4919a6/1_1/470.jpg","srcset":{"2x":"https://images.blinkist.io/images/books/628df2186cee0700084919a6/1_1/640.jpg"}},{"media":"s","src":"https://images.blinkist.io/images/books/628 df2186cee0700084919a6/1_1/640.jpg","srcset":{"2x":"https://images.blinkist.io/images/books/628df2186cee0700084919a6/1_1/1080.jpg"}},{"media":"m","src":"https://images.blinkist.io/ images/books/628df2186cee0700084919a6/1_1/250.jpg","srcset":{"2x":"https://images.blinkist.io/images/books/628df2186cee0700084919a6/1_1/470.jpg"}}]},"audioUrl":"https://hls.blinki st.io/bibs/628df2186cee0700084919a6/628df2196cee0700084919a8-T1653981795.m4a?Expires=1655355345\u0026Signature=G7saoNJx1hZYFnZaj~X2dE0tyJAGg4GUEDTawW4Nuh13qiuHy6maJGjk1agHKo2p9qt7 erLaSOncPXzVErJa2tenzR7qokLK~LZf9QEaRr5bLagkkSAK8SI9TpDw9R6yP6luOlOKzhXO~orkpPzH9Xui5VkOcB5j9VmkxC-pGxEkVoGwOE~ArQuCHNvoyFFLsaadSAKAV2nQV2Jf~280yqO0I7a-rgOwlATQznsB301gQPP9CT56fun nb1GNjCu3cspv~nLcgYUrQkZyT2o72-lOdG8ssl2D1YOcPt0bYNwMnnCvyr99pMor4reyaX2RKF41n-VAf8p2Tu~ZUzkFDA__\u0026Key-Pair-Id=APKAJXJM6BB7FFZXUB4A","chaptersLength":7,"hasAudio":true,"langua ge":"en","freeDaily":null,"category":{"title":"Money \u0026 Investments","sprite":"money-and-investments","slug":"money-and-investments-en"},"averageRating":3.6,"categories":[{"id ":"54788fef6439320008240000","url":"/en/nc/categories/money-and-investments-en","sprite":"money-and-investments","slug":"money-and-investments-en","title":"Money \u0026 Investments","subtitle":"You work hard for your money, right? Let the experts show you how to make it work hard for you."}]},"endTimestamp":1655416799}

NicoWeio commented 2 years ago

That's curious. I don't see an error in the JSON you posted, that is, https://jsonformatter.curiousconcept.com/ doesn't report one. I assume the spaces in e.g. "u rl" were a result of copy-pasting the data? Try using triple backticks for that. Other than that, my best guess is that the latter response is actually fine, and you just had bad luck. I'll implement better error handling, so we can see what's going on.

NicoWeio commented 2 years ago

Turns out this is Cloudflare. I assumed that cloudscraper would raise CloudflareChallengeError automatically, but that's not the case. In @ptrstn's version, this is done in _get_daily_blink_info(self, language="en"). I'll add some retry magic.

NicoWeio commented 2 years ago

Alright then, please try again with the latest main.py (notice the new requirement tenacity). :)

Erik262 commented 2 years ago

@NicoWeio Tried and then this error came up: It seems to work for the first second (could see the downloading bar), then when it started downloading audio files it stopped working and since then I don't see the download bar anymore.

`Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 407, in __call__
    result = fn(*args, **kwargs)
  File "/Users/erik/Downloads/blinkist-main/main.py", line 38, in _api_request
    raise cloudscraper.exceptions.CloudflareChallengeError()
cloudscraper.exceptions.CloudflareChallengeError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/erik/Downloads/blinkist-main/main.py", line 77, in <module>
    free_daily = get_free_daily(locale=locale)
  File "/Users/erik/Downloads/blinkist-main/main.py", line 50, in get_free_daily
    return _api_request('free_daily', params={'locale': locale})
  File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 324, in wrapped_f
    return self(f, *args, **kw)
  File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 404, in __call__
    do = self.iter(retry_state=retry_state)
  File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 361, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x101ed7cd0 state=finished raised CloudflareChallengeError>]`

BUT wait!! after doing some "definition of insanity" it started working just by running it again a few times even when error come up. Interessting.

NicoWeio commented 2 years ago

Thanks for your feedback! I actually forgot adding the retry logic to audio downloads. That should be fixed now, so you don't have to “definition of insanity” yourself. ;)

NicoWeio commented 2 years ago

Hey there, does this kind of error still occur with the latest version of my code?

Erik262 commented 2 years ago

yes, then I gave up testing xD

2600box commented 2 years ago

In my testing, mostly your code works great and I am grateful, but for some reason this particular book throws the same error:

ubuntu:~/blinkist# ./main.py --book-slug the-7-habits-of-highly-effective-people-en ./test/
Book (1/1): “The 7 Habits of Highly Effective People”
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--Error downloading „The 7 Habits of Highly Effective People“ – renaming output directory.
Traceback (most recent call last):
  File "/home/ubuntu/blinkist/lib/python3.10/site-packages/tenacity/__init__.py", line 407, in __call__
    result = fn(*args, **kwargs)
  File "/home/ubuntu/blinkist/blinkist/common.py", line 27, in request
    raise cloudscraper.exceptions.CloudflareChallengeError()
cloudscraper.exceptions.CloudflareChallengeError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/blinkist/./main.py", line 132, in <module>
    main()
  File "/home/ubuntu/blinkist/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/blinkist/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/blinkist/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/blinkist/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/ubuntu/blinkist/./main.py", line 124, in main
    download_book(
  File "/home/ubuntu/blinkist/./main.py", line 43, in download_book
    _ = book.chapters
  File "/usr/lib/python3.10/functools.py", line 981, in __get__
    val = self.func(instance)
  File "/home/ubuntu/blinkist/blinkist/book.py", line 54, in chapters
    chapters = [
  File "/home/ubuntu/blinkist/blinkist/book.py", line 55, in <listcomp>
    Chapter.from_id(self, chapter['id'])
  File "/home/ubuntu/blinkist/blinkist/chapter.py", line 16, in from_id
    chapter_data = api_request_web(f'books/{book.id}/chapters/{chapter_id}')
  File "/home/ubuntu/blinkist/blinkist/common.py", line 49, in api_request_web
    return api_request('https://blinkist.com/api/', endpoint, params=params)
  File "/home/ubuntu/blinkist/blinkist/common.py", line 40, in api_request
    response = request(url, params=params, headers=HEADERS)
  File "/home/ubuntu/blinkist/lib/python3.10/site-packages/tenacity/__init__.py", line 324, in wrapped_f
    return self(f, *args, **kw)
  File "/home/ubuntu/blinkist/lib/python3.10/site-packages/tenacity/__init__.py", line 404, in __call__
    do = self.iter(retry_state=retry_state)
  File "/home/ubuntu/blinkist/lib/python3.10/site-packages/tenacity/__init__.py", line 361, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f7cd59652a0 state=finished raised CloudflareChallengeError>]
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--
NicoWeio commented 1 year ago

Although I have no fix yet, I found some more books that reliably trigger 403 errors:

Maybe there is a pattern?

2600box commented 1 year ago

@Erik262 @NicoWeio I think I have identified some more giving cloudflare errors, if you would like to test?

the-leaders-guide-to-unconscious-bias-en the-8th-habit-en the-speed-of-trust-en building-a-second-brain-en everyone-deserves-a-great-manager-en first-things-first-en 121-first-dates-en a-beautiful-mind-en the-7-habits-of-highly-effective-people-en

NicoWeio commented 1 year ago

Thanks, @2600box! I investigated this some more and found out two things:

1.

In the web app, one can see that the request goes to the expected URL and works, contrary to a request to the same URL by this code. Compared to requests for a book that can be downloaded with this program, some request headers differ (direction of comparison: working → not working).

2.

Providing a valid _blinkist-webapp_session cookie fixes our problems. I verified this for all of the links in your comment above.

Therefore…

If I don't find an alternative, I will add an option to provide this or to automatically extract it from Firefox in the near future.

2600box commented 1 year ago

Thanks for working on this. I tested the new branch with my cookies.sqlite and it worked well.

Ideally being able to specify cookies.txt file would be ideal.

I also noticed you added the "This book has no audio." which is great.

Thanks for continuing this project!

NicoWeio commented 1 year ago

You're very welcome!

Can you elaborate on why a cookies.txt file would be helpful to you? Wouldn't auto-import from all major browsers (to be done) be more comfortable? Of course I could implement both, I just don't see the use case.

2600box commented 1 year ago

You're very welcome!

Can you elaborate on why a cookies.txt file would be helpful to you? Wouldn't auto-import from all major browsers (to be done) be more comfortable? Of course I could implement both, I just don't see the use case.

Sure. First, to me it is a more standard approach. Secondly, it is because I prefer to export the cookie for blinkest individually and third I don't run this on the same machine that has my browser.

phuongnd08 commented 7 months ago

I cherry-pick 95d9367e3670cef1d96fba9681804decc63ced98 and it works great. If you don't have time for a thorough fix I would recommend push 95d9367e3670cef1d96fba9681804decc63ced98 to master and tell folks to use Firefox to login first before using the tool.

NicoWeio commented 7 months ago

That's a good idea. The only reason I didn't to it yet is because support for other browsers seemed so close… and then I never got around to it. As I just wrote in another issue, I hope to get back to this in a month or so.

phuongnd08 commented 7 months ago

I think most of the users are "life-hackers" anyway, they won't mind using Firefox just so that the tool works like a breeze :))