dipu-bd / lightnovel-crawler

Generate and download e-books from online sources.
https://pypi.org/project/lightnovel-crawler/
GNU General Public License v3.0
1.48k stars 293 forks source link

wuxiaworld.site error #1239

Closed plush18 closed 2 years ago

plush18 commented 2 years ago

Yes, I have searched the existing issues.

Novel URL: (https://wuxiaworld.site/) App Location: PIP App Version: 2.28.11

Describe this issue

Any novel I try to download from wuxiaworld.site fails to start crawling. The site is supported but before the app can proceed to the menu with the number of chapters available, an error message shows up.

Screenshot 2022-01-18 at 11 38 25 AM Screenshot 2022-01-18 at 11 43 37 AM
GoldenJJ commented 2 years ago

wuxiaworld.site has recently introduced cloudflare and captcha protection

Brian-Zombait commented 2 years ago

wuxiaworld.site has recently introduced cloudflare and captcha protection

Maybe it's because I use Brave browser and I log in, but I don't get any captchas. I'll try another browser to see.

Brian-Zombait commented 2 years ago

wuxiaworld.site has recently introduced cloudflare and captcha protection

Maybe it's because I use Brave browser and I log in, but I don't get any captchas. I'll try another browser to see.

Just tried with Duckduckgo no login and still no captcha after many pages.

GoldenJJ commented 2 years ago

Just tried with Duckduckgo no login and still no captcha after many pages.

I'm not talking about users visiting the page, try sending a request to wwsite and you'll get the code for the cloudflare page.

Brian-Zombait commented 2 years ago

I'm not talking about users visiting the page, try sending a request to wwsite and you'll get the code for the cloudflare page. Not sure what exactly you mean. Do you mean via CMD window?

GoldenJJ commented 2 years ago

Not sure what exactly you mean. Do you mean via CMD window?

Ah I see where the confusion is, sorry. I meant that if you send a GET request to wuxiaworld.site, which is what ln-crawler is doing, you receive the source code for a cloudflare page, instead of the page requested by the user. Hence, ln-crawler is no longer able to actually visit the website due to the creator implementing protection against crawling.

dipu-bd commented 2 years ago

image

It shows me Access Denied in browser.