NanmiCoder / MediaCrawler

小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫
https://nanmicoder.github.io/MediaCrawler/
Other
18.07k stars 5.59k forks source link

使用代理,是否缺乏代理过期后的重试机制? #456

Open yinzhou-jc opened 1 month ago

yinzhou-jc commented 1 month ago

这一块我不熟,代理过期之后应该是要重新获取新的代理 ip 吗? 现在测试感觉有出这种错误。

Traceback (most recent call last):
  File "C:\Users\yin\anaconda3\envs\py311\Lib\site-packages\httpx\_transports\default.py", line 60, in map_httpcore_exceptions
    yield
  File "C:\Users\yin\anaconda3\envs\py311\Lib\site-packages\httpx\_transports\default.py", line 353, in handle_async_request
    resp = await self._pool.handle_async_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\yin\anaconda3\envs\py311\Lib\site-packages\httpcore\_async\connection_pool.py", line 262, in handle_async_request
    raise exc
  File "C:\Users\yin\anaconda3\envs\py311\Lib\site-packages\httpcore\_async\connection_pool.py", line 245, in handle_async_request
    response = await connection.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\yin\anaconda3\envs\py311\Lib\site-packages\httpcore\_async\http_proxy.py", line 280, in handle_async_request
    raise ProxyError(msg)
httpcore.ProxyError: 454 Proxy Authentication Expired

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\yin\anaconda3\envs\py311\Lib\site-packages\tenacity\_asyncio.py", line 50, in __call__
    result = await fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\github\MediaCrawler\media_platform\xhs\client.py", line 97, in request
    response = await client.request(
               ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\yin\anaconda3\envs\py311\Lib\site-packages\httpx\_client.py", line 1530, in request
    return await self.send(request, auth=auth, follow_redirects=follow_redirects)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\yin\anaconda3\envs\py311\Lib\site-packages\httpx\_client.py", line 1617, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\yin\anaconda3\envs\py311\Lib\site-packages\httpx\_client.py", line 1645, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\yin\anaconda3\envs\py311\Lib\site-packages\httpx\_client.py", line 1682, in _send_handling_redirects
    response = await self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\yin\anaconda3\envs\py311\Lib\site-packages\httpx\_client.py", line 1719, in _send_single_request
    response = await transport.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\yin\anaconda3\envs\py311\Lib\site-packages\httpx\_transports\default.py", line 352, in handle_async_request
    with map_httpcore_exceptions():
  File "C:\Users\yin\anaconda3\envs\py311\Lib\contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "C:\Users\yin\anaconda3\envs\py311\Lib\site-packages\httpx\_transports\default.py", line 77, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ProxyError: 454 Proxy Authentication Expired

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\github\MediaCrawler\main.py", line 67, in <module>
    asyncio.get_event_loop().run_until_complete(main())
  File "C:\Users\yin\anaconda3\envs\py311\Lib\asyncio\base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "D:\github\MediaCrawler\main.py", line 57, in main
    await crawler.start()
  File "D:\github\MediaCrawler\media_platform\xhs\core.py", line 93, in start
    await self.search()
  File "D:\github\MediaCrawler\media_platform\xhs\core.py", line 219, in search
    note_details = await asyncio.gather(*task_list)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\github\MediaCrawler\media_platform\xhs\core.py", line 305, in get_note_detail_async_task
    creator_info_tmp = await self.xhs_client.get_creator_info(user_id=note_detail.get("user", {}).get("user_id"))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\github\MediaCrawler\media_platform\xhs\client.py", line 375, in get_creator_info
    html_content = await self.request("GET", self._domain + uri, return_response=True, headers=self.headers)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\yin\anaconda3\envs\py311\Lib\site-packages\tenacity\_asyncio.py", line 88, in async_wrapped
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\yin\anaconda3\envs\py311\Lib\site-packages\tenacity\_asyncio.py", line 47, in __call__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\yin\anaconda3\envs\py311\Lib\site-packages\tenacity\__init__.py", line 326, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x289517ca590 state=finished raised ProxyError>]
NanmiCoder commented 1 month ago

当前是由于缺乏代理IP更换的逻辑,我之前开源出来的时候,是自己自建了固定代理,一般不会过期。 目前这个确实是问题,后续修复。