abjer / isds2020

Introduction to Social Data Science 2020 - a summer school course abjer.github.io/isds2020
58 stars 92 forks source link

Selenium and scraping_class #46

Closed emilblicher closed 4 years ago

emilblicher commented 4 years ago

Hi

In issue #44 it was stated that the scraping_class should be used. We are using Selenium for the project and I am not entirely sure how to connect the two packages. My code for the scraping_class is as such:

import scraping_class

# keep log with scraping details
logfile = 'log.csv'
connector = scraping_class.Connector(logfile,
                                     connector_type='selenium',
                                     path2selenium = '/Users/emilblicher/.wdm/drivers/chromedriver/mac64/84.0.4147.30/chromedriver')

However, I get the following error:

WebDriverException: Message: Service /Users/emilblicher/.wdm/drivers/chromedriver/mac64/84.0.4147.30/chromedriver unexpectedly exited. Status code was: 1

\ Firstly, how come I get this error? I checked the folder and the chromedriver is there. Secondly, I don't know if this is even the right way to do it – the documentation for scraping_class is very sparse to say the least.

emilblicher commented 4 years ago

Okay, so I think I figured it out. I noticed in the code scraping_class that it is only built for Firefox, so either change the scraping_class.py or use Firefox. :)

jsr-p commented 4 years ago

@emilblicher, exactly :)

emilblicher commented 4 years ago

In regards to logging, I noticed that Selenium had its own log function. It seems to log the same thing and is way easier to incorporate than the scraping_class. Is this fine to use

jsr-p commented 4 years ago

@emilblicher, the only requirement is to make a log file when you are scraping. How you choose to log your data collection is up to you :)

emilblicher commented 4 years ago

Great! :)