Closed trestlesky closed 6 months ago
Hey @trestlesky! I have been having the same issue. did you manage to solve it, or somehow circumvent it?
@theseus-alt
It's really messy and inconvenient, but I am currently using undetected chrome driver, python, and selenium:
https://github.com/ultrafunkamsterdam/undetected-chromedriver
I can't run it headless, it's slow because it actually opens each page I'm scraping, and it's been tricky figuring out an automation schedule... but it works for the time being until I find another solution.
Let me know if you want more details.
Thanks for that @trestlesky I'll see if I can make it work.
Hello everyone,
Firstly, I'd like to apologize for my delayed response. I had set this project aside after Nhentai implemented Cloudflare's protection. However, I've now addressed the issue by incorporating the Cloudflare cookie token from the user's session along with a valid user-agent. Thank you for opening this Issue.
I've recently undertaken a significant refactoring of this project. I would greatly appreciate your continued contributions.
First, thank you for all your hard work on this, can't be easy maintaining a repo like this. Second, I know issues like this have already been posted, but I think I have some additional issues/information to add [identifying info removed].
Some quick backstory: I've been scraping nhentai for years using a combination of urllib, request, and BeautifulSoup. My python script would gather new links and write them to a txt file, which I would then load once a month to HDoujinDownloader to download and sort. 14 days ago, my script stop working. After investigating I found that Cloudfare had been preventing my script from successfully gathering links. While trying to find work-around solutions, I was lead here. I tried the code found in Issue #39 :
from NHentai import NHentai, CloudFlareSettings nhentai_async = NHentai(request_settings=CloudFlareSettings(csrftoken="hne[...]", cf_clearance="1qDg[...]")) print(nhentai_async.get_page(page=1))
And was met with this error:
File "C:[...]\AppData\Local\Programs\Python\Python310\lib\site-packages\NHentai\sync\infra\adapters\request\http\implementations\sync.py", line 37, in handle_error raise Exception(f'An unexpected error occoured while making the request to the website! status code: {request.status}') AttributeError: 'Response' object has no attribute 'status'
My suspicion is that more advanced CloudFare settings have been applied to nhentai and that those settings are preventing successful runs. Is that something you can confirm? Any advice on which direction to head in would be most appreciated.
Thanks again for all your hard work.