Open chrismytton opened 8 years ago
most of the html is returned from one of 3 urls
I should be more specific, the html with lists of people is all served from one identical url, the hompage from another and the empty results page from another. In the case of Uganda there is actually a different url for each person I think, but in other places there may not be.
Problem
In the uganda-parliament-scraper, most of the html is returned from one of 3 urls. I think this is because the site is using the session to store the parameters for the search and results, so the url doesn't change, but the html being returned does.
Proposed solution
Not entirely sure, this specific case might be solvable by looking at the cookies for the request. It would be good to solve this more generally though. Perhaps we need to provide a way for users to provide a custom response class, which could return a unique identifier for the request so it can be written to the filesystem.
Acceptance criteria
The uganda-parliament-scraper should be able to save a separate page on disk for each person page that we scrape.