dlab-berkeley / Python-Web-Scraping-Legacy

D-Lab's 3 hour introduction to web scraping in Python. Learn how to use APIs and scrape data from websites using the New York Times API and BeautifulSoup in Python.
7 stars 3 forks source link

Consider recommending pushshift.io as an alternative to scraping Reddit #2

Open aculich opened 2 years ago

aculich commented 2 years ago

@georgeberkeley From @ck37 on Slack:

Instead of scraping Reddit I would use pushshift.io's tools: https://arxiv.org/abs/2001.08435 (or via the psaw package that Cheng referenced earlier). Maybe this could be added to the web scraping materials?

and for some use cases, the pushshift reddit data already loaded into Google BigQuery is a good option to explore, as well.