eight04 / ComicCrawler

An image crawler written in Python.
265 stars 47 forks source link

解析錯誤! 403 Client Error: Forbidden for url: https://tw.manhuagui.com/comic/xxxxx #381

Open pcwt opened 3 months ago

pcwt commented 3 months ago

更新DENO以及PIP然後升級到COMICCRAWLER新版本後,就再也無法檢查漫畫更新以及下載. 以下是下載時的錯誤訊息: " File "C:\Python312\Lib\site-packages\requests\models.py", line 1024, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://tw.manhuagui.com/comic/xxxxx wait 20 seconds..."

eight04 commented 1 month ago

這邊測試都正常。能貼出完整訊息嗎?

pcwt commented 3 weeks ago

Tested. It caused by Deno as it not supports VPN connections.

pcwt commented 3 weeks ago

Is there anyway can make comiccrawler works through VPN connection? some websites keep block certain IP connections but Deno not support VPN connections which makes comiccrawler become useless.

eight04 commented 3 weeks ago

Deno has no access to internet so I think that is not the issue: https://github.com/eight04/deno_vm/blob/00077b1a934667c7c3b3042b7c30eaf71b30deba/deno_vm/__init__.py#L243-L249

pcwt commented 3 weeks ago

following is the full message replied from comiccrawler when connected to a vpn:

Start analyzing https://tw.manhuagui.com/comic/33232/
Thread crashed: <function DownloadManager.start_analyze.<locals>.analyze_thread at 0x0000018998DA0860>
Traceback (most recent call last):
  File "C:\Python312\Lib\site-packages\worker\__init__.py", line 483, in wrap_worker
    self.ret = self.worker(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\comiccrawler\download_manager.py", line 198, in analyze_thread
    Analyzer(mission).analyze()
  File "C:\Python312\Lib\site-packages\comiccrawler\analyzer.py", line 58, in analyze
    self.do_analyze()
  File "C:\Python312\Lib\site-packages\comiccrawler\analyzer.py", line 81, in do_analyze
    self.html = self.grabber.html(self.mission.url, retry=True)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\comiccrawler\module_grabber.py", line 15, in html
    return self.grab(grabhtml, url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\comiccrawler\module_grabber.py", line 34, in grab
    return grab_method(url, **new_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\comiccrawler\grabber.py", line 154, in grabhtml
    r = grabber(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\comiccrawler\grabber.py", line 106, in grabber
    r = await_(do_request, s, url, proxies, retry, headers=header, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\worker\__init__.py", line 943, in wrapped
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\worker\__init__.py", line 962, in await_
    return async_(callback, *args, **kwargs).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\worker\__init__.py", line 691, in get
    raise err
  File "C:\Python312\Lib\site-packages\worker\__init__.py", line 483, in wrap_worker
    self.ret = self.worker(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\comiccrawler\grabber.py", line 133, in do_request
    r.raise_for_status()
  File "C:\Python312\Lib\site-packages\requests\models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://tw.manhuagui.com/comic/33232/
eight04 commented 3 weeks ago

I wonder if it is caused by the language tag.

Try finding session_manager.py and uncomment this line: https://github.com/eight04/ComicCrawler/blob/1021b2a53b2a1f36d611840f5cd31cbcd63677da/comiccrawler/session_manager.py#L11

pcwt commented 3 weeks ago

Tested. It is working like a charm through VPN connections now. Thanks a lot, really appreciate.