Scraper for Argentina - Githubissues

(Transferred comment)

I've made an initial attempt at this in my argentina branch. Need help, because what i'm doing breaks caching.

From my slack message:

In short: there’s a webpage to links to PDFs; as of late there are two PDFs per day. So the strategy is to 1) parse the main page to get the links, and 2) get which PDFs are for the desired scrape date.

Their page maintains old PDFs, but our cache doesn’t. So I did something that’s probably inappropriate - I force getting the files even if they’re not in the cache. That way, if I try to scrape April 1st, it will get the main page, find the two PDFs for April 1st, and cache them. But the way it works now, it will replace files in the April 1st cache with what is retrieved today.

Eventually parsing this will be another story, but for now, I’d like someone to take a look and let me know what is a better way to at least retrieve all the PDFs and store them in our cache. I assumed that this strategy was better than caching every PDF for every day, but maybe that’s better?

covidatlas / li

Scraper for Argentina #404