dipu-bd / lightnovel-crawler

Generate and download e-books from online sources.
https://pypi.org/project/lightnovel-crawler/
GNU General Public License v3.0
1.43k stars 280 forks source link

https://novelbin.com/ #2418

Open vijayrdx7 opened 1 month ago

vijayrdx7 commented 1 month ago

Let us know

Novel URL: https://novelbin.com/b/the-legitimate-daughter-doesnt-care App Location: PIP App Version: 3.7.2

Describe this issue

HTTPError: 403 Client Error: Forbidden for url: https://novelbin.com/novelbin/the-legitimate-daughter-doesnt-care

! Error: No chapters found

The site works fine in the chrome browser.

image
zGadli commented 1 month ago

There is no problem with the web format. I'll look into the text format.

zGadli commented 1 month ago

Didn't encounter any problem with the text format. Try the whole command with -lll and post it as a comment.

Dylfin commented 1 month ago

Here is with debug info 1.txt

vijayrdx7 commented 1 month ago
C:\Users\vijay>lncrawl -s https://novelbin.com/b/the-legitimate-daughter-doesnt-care --all -i --format text --suppress -lll
================================================================================
                          [#] Lightnovel Crawler v3.7.2
                  https://github.com/dipu-bd/lightnovel-crawler
--------------------------------------------------------------------------------
                          << LOG LEVEL: DEBUG
--------------------------------------------------------------------------------
23:04:01 [DEBUG] (lncrawl.core)
Arguments: Namespace(log=3, log_file=None, list_sources=False, crawler=[], novel_page='https://novelbin.com/b/the-legitimate-daughter-doesnt-care', query=None, login=None, output_formats=['text'], add_source_url=False, single=False, multi=False, output_path=None, filename=None, filename_only=False, force=False, ignore=True, all=True, first=None, last=None, page=None, range=None, volumes=None, chapters=None, proxy_file=None, auto_proxy=False, bot=None, shard_id=0, shard_count=1, selenium_grid=None, suppress=True, ignore_images=False, close_directly=False, extra={})
 ! Input is suppressed
--------------------------------------------------------------------------------
Namespace(log=3, log_file=None, list_sources=False, crawler=[], novel_page='https://novelbin.com/b/the-legitimate-daughter-doesnt-care', query=None, login=None, output_formats=['text'], add_source_url=False, single=False, multi=False, output_path=None, filename=None, filename_only=False, force=False, ignore=True, all=True, first=None, last=None, page=None, range=None, volumes=None, chapters=None, proxy_file=None, auto_proxy=False, bot=None, shard_id=0, shard_count=1, selenium_grid=None, suppress=True, ignore_images=False, close_directly=False, extra={})
23:04:01 [DEBUG] (lncrawl.core.sources)
Loading current index data from C:\Users\vijay\.lncrawl\sources\_index.json
23:04:01 [DEBUG] (lncrawl.core.sources)
Downloading https://raw.githubusercontent.com/dipu-bd/lightnovel-crawler/master/sources/_index.json
23:04:01 [DEBUG] (urllib3.connectionpool)
Starting new HTTPS connection (1): raw.githubusercontent.com:443
23:04:01 [DEBUG] (urllib3.connectionpool)
https://raw.githubusercontent.com:443 "GET /dipu-bd/lightnovel-crawler/master/sources/_index.json HTTP/1.1" 200 38068
23:04:01 [DEBUG] (lncrawl.core.sources)
Saving current index data to C:\Users\vijay\.lncrawl\sources\_index.json
23:04:01 [DEBUG] (lncrawl.core.sources)
Saving current index data to C:\Users\vijay\.lncrawl\sources\_index.json
23:04:01 [DEBUG] (lncrawl.core.sources)
Saving current index data to C:\Users\vijay\.lncrawl\sources\_index.json

-> Press  Ctrl + C  to exit

23:04:03 [INFO] (lncrawl.core.app)
Initialized App
23:04:03 [INFO] (lncrawl.bots.console.integration)
Detected URL input
23:04:03 [INFO] (lncrawl.core.sources)
Initializing crawler for: https://novelbin.com/ [C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\sources\en\n\novelbin.py]
Retrieving novel info...
23:04:03 [DEBUG] (lncrawl.core.scraper)
[GET] https://novelbin.com/b/the-legitimate-daughter-doesnt-care
timeout=(7, 301), allow_redirects=True, proxies={}, headers={b'Accept': b'text/html,application/xhtml+xml,application/xml;q=0.9', b'Origin': b'https://novelbin.com', b'Referer': b'https://novelbin.com/', b'User-Agent': b'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36 Edg/105.0.1343.53'}
23:04:03 [DEBUG] (urllib3.connectionpool)
Starting new HTTPS connection (1): novelbin.com:443
23:04:03 [DEBUG] (urllib3.connectionpool)
https://novelbin.com:443 "GET /b/the-legitimate-daughter-doesnt-care HTTP/1.1" 403 None
--- Logging error ---
Traceback (most recent call last):
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\core\scraper.py", line 123, in __process_request
    response.raise_for_status()
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://novelbin.com/b/the-legitimate-daughter-doesnt-care

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\logging\__init__.py", line 1100, in emit
    msg = self.format(record)
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\logging\__init__.py", line 943, in format
    return fmt.format(record)
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\logging\__init__.py", line 678, in format
    record.message = record.getMessage()
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\logging\__init__.py", line 368, in getMessage
    msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\templates\soup\general.py", line 16, in read_novel_info
    soup = self.get_novel_soup()
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\templates\soup\general.py", line 41, in get_novel_soup
    return self.get_soup(self.novel_url)
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\core\scraper.py", line 306, in get_soup
    response = self.get_response(url, headers=headers, **kwargs)
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\core\scraper.py", line 201, in get_response
    return self.__process_request(
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\core\scraper.py", line 133, in __process_request
    logger.debug(f"{type(e).__qualname__}: {e} | Retrying...", e)
Message: 'HTTPError: 403 Client Error: Forbidden for url: https://novelbin.com/b/the-legitimate-daughter-doesnt-care | Retrying...'
Arguments: (HTTPError('403 Client Error: Forbidden for url: https://novelbin.com/b/the-legitimate-daughter-doesnt-care'),)
23:04:03 [DEBUG] (lncrawl.core.scraper)
[GET] https://novelbin.com/b/the-legitimate-daughter-doesnt-care
timeout=(7, 301), allow_redirects=True, proxies={}, headers={b'Accept': b'text/html,application/xhtml+xml,application/xml;q=0.9', b'Origin': b'https://novelbin.com', b'Referer': b'https://novelbin.com/', b'User-Agent': b'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36 Edg/105.0.1343.53'}
23:04:03 [DEBUG] (urllib3.connectionpool)
Starting new HTTPS connection (1): novelbin.com:443
23:04:03 [DEBUG] (urllib3.connectionpool)
https://novelbin.com:443 "GET /b/the-legitimate-daughter-doesnt-care HTTP/1.1" 403 None
Exception in thread Thread-1 (read_novel_info):
Traceback (most recent call last):
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\templates\soup\general.py", line 16, in read_novel_info
    soup = self.get_novel_soup()
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\templates\soup\general.py", line 41, in get_novel_soup
    return self.get_soup(self.novel_url)
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\core\scraper.py", line 306, in get_soup
    response = self.get_response(url, headers=headers, **kwargs)
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\core\scraper.py", line 201, in get_response
    return self.__process_request(
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\core\scraper.py", line 130, in __process_request
    raise e
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\core\scraper.py", line 123, in __process_request
    response.raise_for_status()
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://novelbin.com/b/the-legitimate-daughter-doesnt-care

 ! Error: No chapters found
<class 'Exception'>
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\bots\console\integration.py", line 107, in start
    raise e
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\bots\console\integration.py", line 101, in start
    _download_novel()
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\bots\console\integration.py", line 85, in _download_novel
    self.app.get_novel_info()
  File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\core\app.py", line 137, in get_novel_info
    raise Exception("No chapters found")

23:04:03 [INFO] (lncrawl.core.app)
App destroyed

--------------------------------------------------------------------------------
 -  https://github.com/dipu-bd/lightnovel-crawler/issues
================================================================================
WorldTeacher commented 1 month ago

adding myself in here since the issue still persists and the website seems to have many mirrors linking to the chapters. Here's my log lncrawl.log