Open peterlravn opened 4 years ago
hi @peterlravn , yes, you are supposed to log your data collection when using Selenium. The Connector class from the lecture will log each request that you make with Selenium automatically. A rule of thumb is to log each time you request a page and get some new HTML that you want to parse. How come it take a couple of hours to scrape 1 static page that you request once? :D
We've scraped 5 hours of worth of data, but the requests have not been logged. The log has picked up lots of other requests where we didn't use Selenium. Can we 'write our way out of it' in our paper or should we collect the data once again?
We have the same problem with logging, it just does not log what we are doing. Maybe we are doing something wrong? We have tried with both:
import scraping_class logfile = 'log_exam.txt' connector = scraping_class.Connector(logfile)
and
driver = webdriver.Chrome(executable_path="/Users/ninibertelsen/Downloads/chromedriver", service_args=["--verbose", "--log-path=exam.log"])
Can you see any mistakes, or is there something we have to do manually as well?
All the best, Nini
PS sorry to hijack this issue, but I thought it was silly to make another one about the exact same thing.
hi everyone,
it is important that you use the get
method of the Connector
class and not the get
method of the webdriver.Chrome
object.
Consider the Connector class from the lectures. When using Selenium and then connector.get()
the following method is used:
The method also uses the get method of the webdriver.Chrome
object. This is done in the line self.browser.get(url) # use selenium get method
. But the difference here is that the following lines log the information to the log file.
If you only use connector.browser.get()
nothing will be written to the log file.
@jesperhauch I would scrape the data again just to practice using the Connector
class in the correct way. But you could probably also just incorporate it into the limitations of your study :)
Thanks, I think the whole connector thing was very confusing, but I think I've got it now :-)
Could you show an example for Selenium? What is 'self' supposed to be?
We are using Selenium to lives crape a static website over a couple of hours. In exercise 6, it said that we are supposed to log our data collection process in our final exam. Are we supposed to log our data collection when using Selenium? We don't repeatedly request a website, so I'm not sure how to log our data.