D-Lab's 3 hour introduction to web scraping in Python. Learn how to use APIs and scrape data from websites using the New York Times API and BeautifulSoup in Python.
7
stars
3
forks
source link
Consider recommending pushshift.io as an alternative to scraping Reddit #2
Instead of scraping Reddit I would use pushshift.io's tools: https://arxiv.org/abs/2001.08435 (or via the psaw package that Cheng referenced earlier). Maybe this could be added to the web scraping materials?
@georgeberkeley From @ck37 on Slack:
and for some use cases, the pushshift reddit data already loaded into Google BigQuery is a good option to explore, as well.