cv-cat / Spider_XHS

小红书爬虫,小红书笔记、主页、搜索爬取
1.03k stars 189 forks source link

有bug Spider_XHS/xhs_utils/xhs_util.py #17

Open j1nse opened 10 months ago

j1nse commented 10 months ago

File "/home/xxxx/tmp/Spider_XHS/xhs_utils/xhs_util.py", line 74, in handle_profile_info info = re.findall(r'', html_text)[0]


IndexError: list index out of range
cv-cat commented 10 months ago

这个错误第一次遇到,可以加我vx或者把运行过程详细说一下

j1nse commented 10 months ago

这个错误第一次遇到,可以加我vx或者把运行过程详细说一下

首先会卡住,手动结束后会这个 ^CTraceback (most recent call last): File "/home/xxxx/tmp/Spider_XHS/home.py", line 95, in <module> home.main(url_list) File "/home/xxxx/tmp/Spider_XHS/home.py", line 83, in main self.save_all_note_info(url) File "/home/xxxx/tmp/Spider_XHS/home.py", line 49, in save_all_note_info profile = self.profile.save_profile_info(url) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/tmp/Spider_XHS/profile.py", line 22, in save_profile_info profile = self.get_profile_info(url) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/tmp/Spider_XHS/profile.py", line 15, in get_profile_info response = requests.get(url, headers=headers, cookies=self.cookies) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/requests/api.py", line 73, in get return request("get", url, params=params, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/requests/api.py", line 59, in request return session.request(method=method, url=url, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/requests/adapters.py", line 486, in send resp = conn.urlopen( ^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/connectionpool.py", line 715, in urlopen httplib_response = self._make_request( ^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/connectionpool.py", line 404, in _make_request self._validate_conn(conn) File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1058, in _validate_conn conn.connect() File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/connection.py", line 363, in connect self.sock = conn = self._new_conn() ^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/connection.py", line 174, in _new_conn conn = connection.create_connection( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection sock.connect(sa) KeyboardInterrupt 第二次运行就会上面那样 `cookie有效 Traceback (most recent call last): File "/home/xxxx/tmp/Spider_XHS/home.py", line 83, in main self.save_all_note_info(url) File "/home/xxxx/tmp/Spider_XHS/home.py", line 49, in save_all_note_info profile = self.profile.save_profile_info(url) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/tmp/Spider_XHS/profile.py", line 22, in save_profile_info profile = self.get_profile_info(url) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/tmp/Spider_XHS/profile.py", line 18, in get_profile_info profile = handle_profile_info(userId, html_text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/tmp/Spider_XHS/xhs_utils/xhs_util.py", line 74, in handle_profile_info info = re.findall(r'', html_text)[0]


IndexError: list index out of range
用户 https://www.xiaohongshu.com/user/profile/5d024990000000001602b9b8 查询失败None`
环境是wsl2,Ubuntu22。
我怀疑是第一次被ban,第二次连接就啥都不返回了
j1nse commented 10 months ago

^CTraceback (most recent call last): File "/home/xxxx/tmp/Spider_XHS/home.py", line 95, in home.main(url_list) File "/home/xxxx/tmp/Spider_XHS/home.py", line 83, in main self.save_all_note_info(url) File "/home/xxxx/tmp/Spider_XHS/home.py", line 49, in save_all_note_info profile = self.profile.save_profile_info(url) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/tmp/Spider_XHS/profile.py", line 22, in save_profile_info profile = self.get_profile_info(url) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/tmp/Spider_XHS/profile.py", line 15, in get_profile_info response = requests.get(url, headers=headers, cookies=self.cookies) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/requests/api.py", line 73, in get return request("get", url, params=params, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/requests/api.py", line 59, in request return session.request(method=method, url=url, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, send_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/requests/adapters.py", line 486, in send resp = conn.urlopen( ^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/connectionpool.py", line 715, in urlopen httplib_response = self._make_request( ^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/connectionpool.py", line 404, in _make_request self._validate_conn(conn) File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1058, in _validate_conn conn.connect() File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/connection.py", line 363, in connect self.sock = conn = self._new_conn() ^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/connection.py", line 174, in _new_conn conn = connection.create_connection( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/miniconda3/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection sock.connect(sa) KeyboardInterrupt

j1nse commented 10 months ago

cookie有效 Traceback (most recent call last): File "/home/xxxx/tmp/Spider_XHS/home.py", line 83, in main self.save_all_note_info(url) File "/home/xxxx/tmp/Spider_XHS/home.py", line 49, in save_all_note_info profile = self.profile.save_profile_info(url) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/tmp/Spider_XHS/profile.py", line 22, in save_profile_info profile = self.get_profile_info(url) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/tmp/Spider_XHS/profile.py", line 18, in get_profile_info profile = handle_profile_info(userId, html_text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xxxx/tmp/Spider_XHS/xhs_utils/xhs_util.py", line 74, in handle_profile_info info = re.findall(r'', html_text)[0]


IndexError: list index out of range
用户 https://www.xiaohongshu.com/user/profile/5d024990000000001602b9b8 查询失败None
xuyao91 commented 10 months ago

我也遇到这么问题 cookie有效 Traceback (most recent call last): File "home.py", line 93, in <module> home.main(url_list) File "home.py", line 83, in main self.save_all_note_info(url) File "home.py", line 49, in save_all_note_info profile = self.profile.save_profile_info(url) File "/Users/xuyao/Workspaces/learns/Spider_XHS/profile.py", line 22, in save_profile_info profile = self.get_profile_info(url) File "/Users/xuyao/Workspaces/learns/Spider_XHS/profile.py", line 18, in get_profile_info profile = handle_profile_info(userId, html_text) File "/Users/xuyao/Workspaces/learns/Spider_XHS/xhs_utils/xhs_util.py", line 74, in handle_profile_info info = re.findall(r'<script>window.__INITIAL_STATE__=(.*?)</script>', html_text)[0] IndexError: list index out of range mac环境,python 3.8

xuyao91 commented 10 months ago

我也遇到这么问题 cookie有效 Traceback (most recent call last): File "home.py", line 93, in <module> home.main(url_list) File "home.py", line 83, in main self.save_all_note_info(url) File "home.py", line 49, in save_all_note_info profile = self.profile.save_profile_info(url) File "/Users/xuyao/Workspaces/learns/Spider_XHS/profile.py", line 22, in save_profile_info profile = self.get_profile_info(url) File "/Users/xuyao/Workspaces/learns/Spider_XHS/profile.py", line 18, in get_profile_info profile = handle_profile_info(userId, html_text) File "/Users/xuyao/Workspaces/learns/Spider_XHS/xhs_utils/xhs_util.py", line 74, in handle_profile_info info = re.findall(r'<script>window.__INITIAL_STATE__=(.*?)</script>', html_text)[0] IndexError: list index out of range mac环境,python 3.8

我把他的返回结果打印了一下,核心内容如下: <body><div id="app"></div><script>function vue3Check(){void 0===window.Proxy&&alert("您当前系统版本过低,请升级后再试")}vue3Check()</script></body> 直接返回系统版本过低,是不是被ban了