Open vijayrdx7 opened 1 month ago
There is no problem with the web format. I'll look into the text format.
Didn't encounter any problem with the text format. Try the whole command with -lll
and post it as a comment.
C:\Users\vijay>lncrawl -s https://novelbin.com/b/the-legitimate-daughter-doesnt-care --all -i --format text --suppress -lll
================================================================================
[#] Lightnovel Crawler v3.7.2
https://github.com/dipu-bd/lightnovel-crawler
--------------------------------------------------------------------------------
<< LOG LEVEL: DEBUG
--------------------------------------------------------------------------------
23:04:01 [DEBUG] (lncrawl.core)
Arguments: Namespace(log=3, log_file=None, list_sources=False, crawler=[], novel_page='https://novelbin.com/b/the-legitimate-daughter-doesnt-care', query=None, login=None, output_formats=['text'], add_source_url=False, single=False, multi=False, output_path=None, filename=None, filename_only=False, force=False, ignore=True, all=True, first=None, last=None, page=None, range=None, volumes=None, chapters=None, proxy_file=None, auto_proxy=False, bot=None, shard_id=0, shard_count=1, selenium_grid=None, suppress=True, ignore_images=False, close_directly=False, extra={})
! Input is suppressed
--------------------------------------------------------------------------------
Namespace(log=3, log_file=None, list_sources=False, crawler=[], novel_page='https://novelbin.com/b/the-legitimate-daughter-doesnt-care', query=None, login=None, output_formats=['text'], add_source_url=False, single=False, multi=False, output_path=None, filename=None, filename_only=False, force=False, ignore=True, all=True, first=None, last=None, page=None, range=None, volumes=None, chapters=None, proxy_file=None, auto_proxy=False, bot=None, shard_id=0, shard_count=1, selenium_grid=None, suppress=True, ignore_images=False, close_directly=False, extra={})
23:04:01 [DEBUG] (lncrawl.core.sources)
Loading current index data from C:\Users\vijay\.lncrawl\sources\_index.json
23:04:01 [DEBUG] (lncrawl.core.sources)
Downloading https://raw.githubusercontent.com/dipu-bd/lightnovel-crawler/master/sources/_index.json
23:04:01 [DEBUG] (urllib3.connectionpool)
Starting new HTTPS connection (1): raw.githubusercontent.com:443
23:04:01 [DEBUG] (urllib3.connectionpool)
https://raw.githubusercontent.com:443 "GET /dipu-bd/lightnovel-crawler/master/sources/_index.json HTTP/1.1" 200 38068
23:04:01 [DEBUG] (lncrawl.core.sources)
Saving current index data to C:\Users\vijay\.lncrawl\sources\_index.json
23:04:01 [DEBUG] (lncrawl.core.sources)
Saving current index data to C:\Users\vijay\.lncrawl\sources\_index.json
23:04:01 [DEBUG] (lncrawl.core.sources)
Saving current index data to C:\Users\vijay\.lncrawl\sources\_index.json
-> Press Ctrl + C to exit
23:04:03 [INFO] (lncrawl.core.app)
Initialized App
23:04:03 [INFO] (lncrawl.bots.console.integration)
Detected URL input
23:04:03 [INFO] (lncrawl.core.sources)
Initializing crawler for: https://novelbin.com/ [C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\sources\en\n\novelbin.py]
Retrieving novel info...
23:04:03 [DEBUG] (lncrawl.core.scraper)
[GET] https://novelbin.com/b/the-legitimate-daughter-doesnt-care
timeout=(7, 301), allow_redirects=True, proxies={}, headers={b'Accept': b'text/html,application/xhtml+xml,application/xml;q=0.9', b'Origin': b'https://novelbin.com', b'Referer': b'https://novelbin.com/', b'User-Agent': b'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36 Edg/105.0.1343.53'}
23:04:03 [DEBUG] (urllib3.connectionpool)
Starting new HTTPS connection (1): novelbin.com:443
23:04:03 [DEBUG] (urllib3.connectionpool)
https://novelbin.com:443 "GET /b/the-legitimate-daughter-doesnt-care HTTP/1.1" 403 None
--- Logging error ---
Traceback (most recent call last):
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\core\scraper.py", line 123, in __process_request
response.raise_for_status()
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://novelbin.com/b/the-legitimate-daughter-doesnt-care
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\logging\__init__.py", line 1100, in emit
msg = self.format(record)
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\logging\__init__.py", line 943, in format
return fmt.format(record)
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\logging\__init__.py", line 678, in format
record.message = record.getMessage()
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\logging\__init__.py", line 368, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\threading.py", line 973, in _bootstrap
self._bootstrap_inner()
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\templates\soup\general.py", line 16, in read_novel_info
soup = self.get_novel_soup()
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\templates\soup\general.py", line 41, in get_novel_soup
return self.get_soup(self.novel_url)
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\core\scraper.py", line 306, in get_soup
response = self.get_response(url, headers=headers, **kwargs)
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\core\scraper.py", line 201, in get_response
return self.__process_request(
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\core\scraper.py", line 133, in __process_request
logger.debug(f"{type(e).__qualname__}: {e} | Retrying...", e)
Message: 'HTTPError: 403 Client Error: Forbidden for url: https://novelbin.com/b/the-legitimate-daughter-doesnt-care | Retrying...'
Arguments: (HTTPError('403 Client Error: Forbidden for url: https://novelbin.com/b/the-legitimate-daughter-doesnt-care'),)
23:04:03 [DEBUG] (lncrawl.core.scraper)
[GET] https://novelbin.com/b/the-legitimate-daughter-doesnt-care
timeout=(7, 301), allow_redirects=True, proxies={}, headers={b'Accept': b'text/html,application/xhtml+xml,application/xml;q=0.9', b'Origin': b'https://novelbin.com', b'Referer': b'https://novelbin.com/', b'User-Agent': b'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36 Edg/105.0.1343.53'}
23:04:03 [DEBUG] (urllib3.connectionpool)
Starting new HTTPS connection (1): novelbin.com:443
23:04:03 [DEBUG] (urllib3.connectionpool)
https://novelbin.com:443 "GET /b/the-legitimate-daughter-doesnt-care HTTP/1.1" 403 None
Exception in thread Thread-1 (read_novel_info):
Traceback (most recent call last):
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\templates\soup\general.py", line 16, in read_novel_info
soup = self.get_novel_soup()
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\templates\soup\general.py", line 41, in get_novel_soup
return self.get_soup(self.novel_url)
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\core\scraper.py", line 306, in get_soup
response = self.get_response(url, headers=headers, **kwargs)
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\core\scraper.py", line 201, in get_response
return self.__process_request(
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\core\scraper.py", line 130, in __process_request
raise e
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\core\scraper.py", line 123, in __process_request
response.raise_for_status()
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://novelbin.com/b/the-legitimate-daughter-doesnt-care
! Error: No chapters found
<class 'Exception'>
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\bots\console\integration.py", line 107, in start
raise e
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\bots\console\integration.py", line 101, in start
_download_novel()
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\bots\console\integration.py", line 85, in _download_novel
self.app.get_novel_info()
File "C:\Users\vijay\AppData\Local\Programs\Python\Python310\lib\site-packages\lncrawl\core\app.py", line 137, in get_novel_info
raise Exception("No chapters found")
23:04:03 [INFO] (lncrawl.core.app)
App destroyed
--------------------------------------------------------------------------------
- https://github.com/dipu-bd/lightnovel-crawler/issues
================================================================================
adding myself in here since the issue still persists and the website seems to have many mirrors linking to the chapters. Here's my log lncrawl.log
Let us know
Novel URL: https://novelbin.com/b/the-legitimate-daughter-doesnt-care App Location: PIP App Version: 3.7.2
Describe this issue
HTTPError: 403 Client Error: Forbidden for url: https://novelbin.com/novelbin/the-legitimate-daughter-doesnt-care
! Error: No chapters found
The site works fine in the chrome browser.