homework 1 scraping URL question

Collecting data from websites that require login can be complicated. A common way is to use the Selenium package. For instance, the following code automatically login to GitHub using Selenium. This allows you to access and collect all contents in GitHub that require login (Note that Selenium package may not work well on Google Colab).

[1] Install Selenium and chromedriver

pip install selenium
brew install chromedriver

This is for Mac users, Windows users should run pip install selenium and manually download chromedriver (See https://chromedriver.chromium.org/home)

[2] Run the following Python code

from selenium import webdriver
driver = webdriver.Chrome(executable_path='/opt/homebrew/bin/chromedriver')
driver.get('https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2FUChicago-Computational-Content-Analysis%2FFrequently-Asked-Questions')
driver.find_element_by_id('login_field').send_keys('Your GitHub ID')
driver.find_element_by_id('password').send_keys('Your GitHub PW')
driver.find_element_by_id('password').send_keys(Keys.ENTER)

UChicago-Computational-Content-Analysis / Frequently-Asked-Questions

homework 1 scraping URL question #6