For every site in public.sites, use start_year and end_year to get all combinations of site ID, url and year to scrape.
Only "Olympic" years should be pulled. In the case where the timeframe is too short, we should pull the closest year.
"Olympic" years (or years where year % 4 == 0) are used to trim down on the amount of scraping required but also to track changes due to election cycles.
For every site in
public.sites
, usestart_year
andend_year
to get all combinations of site ID, url and year to scrape.Only "Olympic" years should be pulled. In the case where the timeframe is too short, we should pull the closest year.
"Olympic" years (or years where
year % 4 == 0
) are used to trim down on the amount of scraping required but also to track changes due to election cycles.This script should work something like
See
src/commands/upload-organizations.py
andsrc/utils/psql.py
for examples on how to connect to the DB