dipu-bd / lightnovel-crawler

Generate and download e-books from online sources.
https://pypi.org/project/lightnovel-crawler/
GNU General Public License v3.0
1.42k stars 279 forks source link

https://www.69shuba.pro/ #2400

Open pantasa opened 2 months ago

pantasa commented 2 months ago

Let us know

Novel URL: <your novel url or query> https://www.69shuba.pro/book/49986.htm App Location: PIP | EXE | Discord | Telegram pip|exe App Version: x.y.z newest

Describe this issue

domain is not supported eventhough I've modify the code but it's only successfully scrape the first 100 chapter after that got ConnectTimeout: HTTPSConnectionPool(host='www.69shuba.pro', port=443) i think it's got ban by IP

'Connection to www.69shuba.pro timed out. (connect timeout=7)')) ConnectTimeout: HTTPSConnectionPool(host='www.69shuba.pro', port=443): Max retries exceeded with url: /txt/43616/38188989 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x0000025E38E8DCD0>, 'Connection to www.69shuba.pro timed out. (connect timeout=7)')) ConnectTimeout: HTTPSConnectionPool(host='www.69shuba.pro', port=443): Max retries exceeded with url: /txt/43616/38188990 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x0000025E38EADE20>, 'Connection to www.69shuba.pro timed out. (connect timeout=7)')) ConnectTimeout: HTTPSConnectionPool(host='www.69shuba.pro', port=443): Max retries exceeded with url: /txt/43616/38193667 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object a

Screenshot 2024-06-18 014703

t 0x0000025E38EAEFC0>, 'Connection to www.69shuba.pro timed out. (connect timeout=7)'))

zGadli commented 1 month ago

You have to rate limit in the initializer of the crawler script written for the source. e.g:

class sixnineshu(Crawler):
    base_url = [
        "https://69shuba.cx"
    ]

    def initialize(self):
        self.init_parser("html.parser")
        self.init_executor(ratelimit=20)