ZouJiu1 / zhihu_spider_selenium

爬取知乎个人主页的想法、文篇和回答
MIT License
33 stars 12 forks source link

爬取回答时报错,文章、想法可以爬取 #5

Open 66my opened 4 months ago

66my commented 4 months ago

报错内容

DevTools listening on ws://127.0.0.1:9922/devtools/browser/8b5cd6db-98dc-4859-a19b-586646e5eccd
[25540:10460:0430/152431.589:ERROR:fallback_task_provider.cc(127)] Every renderer should have at least one task provided by a primary task provider. If a "Renderer" fallback task is shown, it is a bug. If you have repro steps, please file a new bug and tag it as a dependency of crbug.com/739782.
[25540:10460:0430/152438.872:ERROR:fallback_task_provider.cc(127)] Every renderer should have at least one task provided by a primary task provider. If a "Renderer" fallback task is shown, it is a bug. If you have repro steps, please file a new bug and tag it as a dependency of crbug.com/739782.
Traceback (most recent call last):
  File "D:\24Python\zhihu_spider_selenium-master\crawler.py", line 1117, in <module>
    zhihu()
  File "D:\24Python\zhihu_spider_selenium-master\crawler.py", line 1053, in zhihu
    crawl_answers_links(driver, username)
  File "D:\24Python\zhihu_spider_selenium-master\crawler.py", line 177, in crawl_answers_links
    WebDriverWait(driver, timeout=10).until(lambda d: d.find_element(By.CLASS_NAME, "Pagination"))
  File "S:\condaenv\getdata310new\lib\site-packages\selenium\webdriver\support\wait.py", line 95, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Stacktrace:
        GetHandleVerifier [0x00007FF6D98FD8E2+35890]
        Microsoft::Applications::Events::FromJSON [0x00007FF6D98BACC2+1330002]
        Microsoft::Applications::Events::ILogManager::operator= [0x00007FF6D96AE137+5095]
        Microsoft::Applications::Events::GUID_t::GUID_t [0x00007FF6D96F4E7E+159950]
        Microsoft::Applications::Events::GUID_t::GUID_t [0x00007FF6D96F4F66+160182]
        Microsoft::Applications::Events::GUID_t::GUID_t [0x00007FF6D972FEF7+401735]
        Microsoft::Applications::Events::GUID_t::GUID_t [0x00007FF6D971474F+289183]
        Microsoft::Applications::Events::GUID_t::GUID_t [0x00007FF6D96EA6C7+117015]
        Microsoft::Applications::Events::GUID_t::GUID_t [0x00007FF6D972DAF1+392513]
        Microsoft::Applications::Events::GUID_t::GUID_t [0x00007FF6D9714373+288195]
        Microsoft::Applications::Events::GUID_t::GUID_t [0x00007FF6D96E9BEE+114238]
        Microsoft::Applications::Events::GUID_t::GUID_t [0x00007FF6D96E8DAC+110588]
        Microsoft::Applications::Events::GUID_t::GUID_t [0x00007FF6D96E97A1+113137]
        GetHandleVerifier [0x00007FF6D99939F4+650564]
        Microsoft::Applications::Events::FromJSON [0x00007FF6D97899BC+79948]
        Microsoft::Applications::Events::FromJSON [0x00007FF6D9862D4C+969692]
        Microsoft::Applications::Events::FromJSON [0x00007FF6D985B485+938773]
        GetHandleVerifier [0x00007FF6D99929B5+646405]
        Microsoft::Applications::Events::FromJSON [0x00007FF6D98C2E81+1363217]
        Microsoft::Applications::Events::FromJSON [0x00007FF6D98BE4F4+1344388]
        Microsoft::Applications::Events::FromJSON [0x00007FF6D98BE62B+1344699]
        Microsoft::Applications::Events::FromJSON [0x00007FF6D98B5B21+1309105]
        BaseThreadInitThunk [0x00007FF970E5257D+29]
        RtlUserThreadStart [0x00007FF971E6AA58+40]
ZouJiu1 commented 4 months ago

已经修复了的,这边可以正常爬取website以及回答的,若是存在问题,需要给出爬取的网址,以及相应的报错

66my commented 4 months ago

测试了一下,仍然不行,非网络问题。

python crawler.py --answer --links_scratch

生成了 answers.txt,正常抓了链接地址,在生成第一个回答时,程序崩溃。 也就是第一个回答没有输出 .md.pdf 文件程序就报错了。

66my commented 4 months ago

问题已经解决,认为是win11系统下,edge自动开启了效能模式,导致网络正常时放在后台流量打不到要求,使用新版代码,持续在前台是可以正常下载的。