dipu-bd / lightnovel-crawler

Generate and download e-books from online sources.
https://pypi.org/project/lightnovel-crawler/
GNU General Public License v3.0
1.42k stars 279 forks source link

Consider using some free proxy servers #1304

Closed damare01 closed 2 years ago

damare01 commented 2 years ago

Let us know

Novel URL: https://www.lightnovelpub.com/novel/bank-of-the-universe-12032016 App Location: Discord and Telegram App Version: 2.29.7

Describe this issue

Unable to crawl

022-03-19 05:04:32,926 [ERROR] (lncrawl.core.downloader)
Body is empty: https://www.lightnovelpub.com/novel/bank-of-the-universe-12032016/1036-chapter-285

2022-03-19 05:04:36,250 [ERROR] (lncrawl.core.downloader)
Body is empty: https://www.lightnovelpub.com/novel/bank-of-the-universe-12032016/1036-chapter-287

2022-03-19 05:04:34,820 [ERROR] (lncrawl.core.downloader)
Body is empty: https://www.lightnovelpub.com/novel/bank-of-the-universe-12032016/1036-chapter-286
dipu-bd commented 2 years ago

lightnovelpub seems to block consecutive requests for an IP after downloading 200 or so chapters.

to solve this issue we can consider using some free proxy servers

dipu-bd commented 2 years ago

I found a nice package: https://pypi.org/project/free-proxy/

Unfortunately it has become so popular that https proxies are hard to acquire. But there are lots of http proxies available.

Now the crawler will automatically change the proxy servers every 30s interval.