eight04 / ComicCrawler

An image crawler written in Python.
267 stars 47 forks source link

pixiv 403 #289

Closed bluelovers closed 3 years ago

bluelovers commented 4 years ago

image

Start analyzing https://www.pixiv.net/users/58849886/artworks
Thread crashed: <bound method BatchAnalyzer.analyze of <comiccrawler.batch_analyzer.BatchAnalyzer object at 0x08CE9430>>
Traceback (most recent call last):
  File "c:\python37-32\lib\site-packages\worker\__init__.py", line 474, in wrap_worker
    self.ret = self.worker(*args, **kwargs)
  File "c:\python37-32\lib\site-packages\comiccrawler\batch_analyzer.py", line 43, in analyze
    self.do_analyze()
  File "c:\python37-32\lib\site-packages\comiccrawler\batch_analyzer.py", line 59, in do_analyze
    Analyzer(mission).analyze()
  File "c:\python37-32\lib\site-packages\comiccrawler\analyzer.py", line 57, in analyze
    self.do_analyze()
  File "c:\python37-32\lib\site-packages\comiccrawler\analyzer.py", line 80, in do_analyze
    self.html = self.grabber.html(self.mission.url, retry=True)
  File "c:\python37-32\lib\site-packages\comiccrawler\module_grabber.py", line 17, in html
    **kwargs
  File "c:\python37-32\lib\site-packages\comiccrawler\grabber.py", line 151, in grabhtml
    r = grabber(*args, **kwargs)
  File "c:\python37-32\lib\site-packages\comiccrawler\grabber.py", line 105, in grabber
    r = await_(do_request, s, url, proxies, retry, **kwargs)
  File "c:\python37-32\lib\site-packages\worker\__init__.py", line 905, in wrapped
    return f(*args, **kwargs)
  File "c:\python37-32\lib\site-packages\worker\__init__.py", line 927, in await_
    return async_(callback, *args, **kwargs).get()
  File "c:\python37-32\lib\site-packages\worker\__init__.py", line 682, in get
    raise err
  File "c:\python37-32\lib\site-packages\worker\__init__.py", line 474, in wrap_worker
    self.ret = self.worker(*args, **kwargs)
  File "c:\python37-32\lib\site-packages\comiccrawler\grabber.py", line 131, in do_request
    r.raise_for_status()
  File "c:\python37-32\lib\site-packages\requests\models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://www.pixiv.net/users/58849886/artworks
bluelovers commented 4 years ago

@eight04

我猜這個說的是 pixiv

https://www.facebook.com/faryne/posts/10158987895085719

image

eight04 commented 4 years ago

看起來 pixiv 現在有用 cloudflare 的反機器人技術。https://github.com/Nandaka/PixivUtil2/issues/814 有些討論

eight04 commented 4 years ago

今天測試時正常,不是很懂