G-EE-Group / g-ee

Git Repo for scripts created by members of the Grafana - Experts Exchange Facebook group.
MIT License
3 stars 0 forks source link

"AttributeError: 'list' object has no attribute 'rfind'" when running scraper #9

Open id-ic opened 4 years ago

id-ic commented 4 years ago

I'm trying to use the ATT_Web_Scraper.py on linux to build my own scraper but when I run the ATT_Web_Scraper.py i'm getting an error before things really get going.

First off: not a coder, my issue not yours obviously. so apologies for these questions.

I believe the error is when the options for the chrome driver are being set.

$ python ATT_Web_Scraper.py
Chromedriver location: "C:\chromedriver.exe"
Chrome state headless?: true
This server REQUIRES authentication.
Chromedriver configured to run in headless mode
Traceback (most recent call last):
  File "ATT_Web_Scraper.py", line 99, in <module>
    driver = webdriver.Chrome(options=options, executable_path=[Chromedriver_Location])
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
    self.service.start()
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/common/service.py", line 95, in start
    (os.path.basename(self.path), self.start_error_message, str(e)))
  File "/usr/lib/python2.7/posixpath.py", line 114, in basename
    i = p.rfind('/') + 1
AttributeError: 'list' object has no attribute 'rfind'
alexandzors commented 4 years ago

@SolarSistim is the creator of the script so gonna tag him

SolarSistim commented 4 years ago

Humpf. I can't say I've seen this one before. The script fails at "Chromedriver configured to run in headless mode" so you may be onto something about the web driver.

Switch the Web driver to run NOT in headless mode by adjusting the following setting in the ini file:

Switch: Run Chromedriver in headless mode?: true

To: Run Chromedriver in headless mode?: false

Now the web driver will fire on your screen and you'll be able to see exactly what it's doing when/if it fails again. That should at least give us more information and kinda push the ball forward a little bit.

If you're running a headless Linux version then move the script to a PC with a GUI, preferably Windows and test it there. At least we can isolate the issue further.

id-ic commented 4 years ago

Crap. I'm running this on a non-gui vm.

I did update the config, which in the example above does not have entries but below does. It still give me the same error.

Chromedriver location: "/usr/bin/chromedriver"
Chrome state headless?: false
This server DOES NOT require authentication.
Traceback (most recent call last):
  File "Telus_web_scraper.py", line 100, in <module>
    driver = webdriver.Chrome(options=options, executable_path=[Chromedriver_Location])
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
    self.service.start()
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/common/service.py", line 95, in start
    (os.path.basename(self.path), self.start_error_message, str(e)))
  File "/usr/lib/python2.7/posixpath.py", line 114, in basename
    i = p.rfind('/') + 1
AttributeError: 'list' object has no attribute 'rfind'

Errors on the same lines.

SolarSistim commented 4 years ago

The only difference that I can see between your set up and mine the that mine doesn't have authentication enable for Influx. As you can see by this image that I took after running the script a few minutes ago the script appear to be working correctly for me:

https://imgur.com/nEMDVAa

The script is working on my Windows system, and since this is Python it should give the exact same result, but clearly it is not. I wonder if you could temporarily disable Influx authentication? It's possible that I did not configure that part of the script correctly since I am not using authentication.

id-ic commented 4 years ago

Sorry for the long reply time on this. Busy life

Anyhow, I got things to progress to the point where chrome is actually firing up now and i can get a site to load.

Here is what i changed to get it to work on a non gui linux (Debian) terminal session in a VM. Basically the no sandbox and the change in webdriver.Chrome to define the chromedriver location was all that needed to be done. I couldn't find any explanation for why this worked ... but it did.

-options.add_argument("--log-level=3")
+options.add_argument("--log-level=ALL")  ### not a necessity 

+options.add_argument('--no-sandbox')
-driver = webdriver.Chrome(options=options, executable_path=[Chromedriver_Location])
+driver = webdriver.Chrome('/usr/bin/chromedriver', options=options,  service_args=['--verbose', '--log-path=./chromedriver.log'])
SolarSistim commented 4 years ago

I will looks at this closer later and integrate. Thank you!

Get Outlook for Androidhttps://aka.ms/ghei36


From: id-ic notifications@github.com Sent: Tuesday, November 26, 2019 8:54:45 PM To: alexandzors/g-ee g-ee@noreply.github.com Cc: SolarSistim cwmpensacola@gmail.com; Assign assign@noreply.github.com Subject: Re: [alexandzors/g-ee] "AttributeError: 'list' object has no attribute 'rfind'" when running scraper (#9)

Sorry for the long reply time on this. Busy life

Anyhow, I got things to progress to the point where chrome is actually firing up now and i can get a site to load.

Here is what i changed to get it to work on a non gui linux (Debian) terminal session in a VM. Basically the no sandbox and the change in webdriver.Chrome to define the chromedriver location was all that needed to be done. I couldn't find any explanation for why this worked ... but it did.

-options.add_argument("--log-level=3") +options.add_argument("--log-level=ALL") ### not a necessity

+options.add_argument('--no-sandbox') -driver = webdriver.Chrome(options=options, executable_path=[Chromedriver_Location]) +driver = webdriver.Chrome('/usr/bin/chromedriver', options=options, service_args=['--verbose', '--log-path=./chromedriver.log'])

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/alexandzors/g-ee/issues/9?email_source=notifications&email_token=ALM55GA2SQ6GPUZC3V7LV6DQVXOPLA5CNFSM4JOYT5D2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFIDSAA#issuecomment-558905600, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALM55GGBAW3UYDNVOICZ5X3QVXOPLANCNFSM4JOYT5DQ.