selenium.py运行出错

soen0905 commented 2 years ago

我在尝试运行时报错，个人猜测不知道是生成器的哪里出现了问题，也有可能是版本问题（windows10系统，python版本为3.8）

def parse_index():
    elements = browser.find_elements_by_css_selector('#index .item .name')
    for element in elements:
        href = element.get_attribute('href')
        yield urljoin(INDEX_URL, href)

我的报错显示如下：

2022-07-19 20:52:44,940 - INFO:scraping https://spa2.scrape.center/page/1
2022-07-19 20:52:48,127 - INFO:detail url https://spa2.scrape.center/detail/ZWYzNCN0ZXVxMGJ0dWEjKC01N3cxcTVvNS0takA5OHh5Z2ltbHlmeHMqLSFpLTAtbWIx
2022-07-19 20:52:48,127 - INFO:scraping https://spa2.scrape.center/detail/ZWYzNCN0ZXVxMGJ0dWEjKC01N3cxcTVvNS0takA5OHh5Z2ltbHlmeHMqLSFpLTAtbWIx
abcd
2022-07-19 20:52:49,963 - INFO:detail data {'url': 'https://spa2.scrape.center/detail/ZWYzNCN0ZXVxMGJ0dWEjKC01N3cxcTVvNS0takA5OHh5Z2ltbHlmeHMqLSFpLTAtbWIx', 'name': '霸王别姬 - Farewell My Concubine', 'categories': ['剧情', '爱情'], 'cover': 'https://p0.meituan.net/movie/ce4da3e03e655b5b88ed31b5cd7896cf62472.jpg@464w_644h_1e_1c', 'score': '9.5', 'drama': '影片借一出《霸王别姬》的京戏，牵扯出三个人之间一段随时代风云变幻的爱恨情仇。段小楼（张丰毅 饰）与程蝶衣（张国荣 饰）是一对打小一起长大的师兄弟，两人一个演生，一个饰旦，一向配合天衣无缝，尤其一出《霸王别姬》，更是誉满京城，为此，两人约定合演一辈子《霸王别姬》。但两人对戏剧与人生关系的理解有本质不同，段小楼深知戏非人生，程蝶衣则是人戏不分。段小楼在认为该成家立业之时迎娶了名妓菊仙（巩俐 饰），致使程蝶衣认定菊仙是可耻的第三者，使段小楼做了叛徒，自此，三人围绕一出《霸王别姬》生出的爱恨情仇战开始随着时代风云的变迁不断升级，终酿成悲剧。'}
Traceback (most recent call last):
  File "D:/kinds_work/python_work/spider/第七章/selenium_spider/scrape_Spa2.py", line 93, in <module>
    main()
  File "D:/kinds_work/python_work/spider/第七章/selenium_spider/scrape_Spa2.py", line 81, in main
    for detail_url in detail_urls:
  File "D:/kinds_work/python_work/spider/第七章/selenium_spider/scrape_Spa2.py", line 45, in parse_index
    href = element.get_attribute('href')
  File "E:\anaconda\envs\spider\lib\site-packages\selenium\webdriver\remote\webelement.py", line 139, in get_attribute
    attributeValue = self.parent.execute_script(
  File "E:\anaconda\envs\spider\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 634, in execute_script
    return self.execute(command, {
  File "E:\anaconda\envs\spider\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "E:\anaconda\envs\spider\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
  (Session info: chrome=103.0.5060.114)

当我尝试将原代码转为

def parse_index():
    temp = []
    elements = browser.find_elements_by_css_selector('#index .item .name')
    for element in elements:
        href = element.get_attribute('href')
        temp.append(href)
    return temp
        # yield urljoin(INDEX_URL, href)

后，程序可以正常运行，我实在无法理解为什么会出现这样的问题。

尝试过调试该段代码，在第二次for循环中对于element.get_attribute('href')中element对象的传入没有问题。

希望大佬能拨冗解答我的疑问

hefeng61 commented 1 year ago

这块将生成器转为了list，但我不清楚为什么要这样，前面的例子也没有这样的操作

soen0905 commented 1 year ago

我超级就没有看这玩意了，谷歌给我的答案是：可能在于list后detail_urls就全部加载入内存了，这样会不卡在这个地方？或者说方便调试？感觉使用in访问生成器中的值，或者说可能出问题？ whatever,,,,just guess. XD

Python3WebSpider / ScrapeSpa2

selenium.py运行出错 #2