SeleniumHQ / selenium

A browser automation framework and ecosystem.
https://selenium.dev
Apache License 2.0
30.71k stars 8.19k forks source link

[🐛 Bug]: Driver works on the 2nd or more time it's run, but not the first. #11501

Closed jackrodgers3 closed 1 year ago

jackrodgers3 commented 1 year ago

What happened?

I have a list of 30 URLs that I want to scrape with selenium. The only problem is that when I run the chromedriver for any one of these URLs for the first time, it throws an error as shown below. However, when that URL is run for the second time, the code works perfectly fine. Seeing as I have 30 URLs that I have to run at once, this would mean it would take me 60 runs to get every URL to work in a loop, which I am not doing. I thought it was a timing issue but I used implicit waits and the problem just kept happening.

How can we reproduce the issue?

##main processes
from bs4 import BeautifulSoup
import requests
import time as t
from selenium import webdriver

def BrowseWeb(urlad):
    driver = webdriver.Chrome()
    driver.implicitly_wait(20)
    driver.get(urlad)
    driver.implicitly_wait(20)
    rndrhtml = driver.page_source
    driver.implicitly_wait(20)
    driver.close()
    driver.quit()
    soup3 = BeautifulSoup(rndrhtml, 'html.parser')
    ovrall = str(soup3.find("tbody", "Crom_body__UYOcU").find_all("td"))
    for d in range(0, 2):
        di = ovrall.find("<td>")
        ovrall = ovrall[(di+4):]
        #conditions
        if d == 1:
            di2 = ovrall.find("</td>")
            return float(ovrall[:di2])

print(BrowseWeb('https://www.nba.com/stats/team/1610612766/advanced'))

Relevant log output

Traceback (most recent call last):

  File C:\Program Files\Spyder\pkgs\spyder_kernels\py3compat.py:356 in compat_exec
    exec(code, globals, locals)

  File c:\users\jackm\spyder\nbayuck.py:86
    print(BrowseWeb('https://www.nba.com/stats/team/1610612766/advanced'))

  File c:\users\jackm\spyder\nbayuck.py:77 in BrowseWeb
    ovrall = str(soup3.find("tbody", "Crom_body__UYOcU").find_all("td"))

AttributeError: 'NoneType' object has no attribute 'find_all'

Operating System

Windows 10

Selenium version

Python 3.7

What are the browser(s) and version(s) where you see this issue?

Chrome 108

What are the browser driver(s) and version(s) where you see this issue?

Chromedriver 108

Are you using Selenium Grid?

Selenium 4.7.2

github-actions[bot] commented 1 year ago

@jackrodgers3, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

titusfortner commented 1 year ago

Implicit waits only apply to finding elements, not to ensuring that the page has sufficiently loaded for page source to contain the tbody value you want. Consider waiting to locate the tbody, then getting page source and parsing it with beautiful soup