Code4HR / open-health-inspection-scraper

Scraper for the open-health-inspector app.
Apache License 2.0
7 stars 9 forks source link

Change in Healthspace URL format #32

Closed ttavenner closed 8 years ago

ttavenner commented 9 years ago

They have implemented a new ASP-based site with splash pages. all the URLs have changed. The scraper will need some significant overhaul to get working again.

kmcurry commented 9 years ago

Major bummer. :frowning:

ttavenner commented 9 years ago

It looks like only some of the health districts have changed over to the new format. So a probable course of actions is to identify these and shunt them off to a separate process that correctly scrapes the new format. This would make the process more sustainable if/when more districts make the switch without throwing out the districts still on the old format.

ttavenner commented 8 years ago

I had a bit of time to look at this last night and it isn't just an issue with the URL formats changing. The lists of establishments now use Javascript to build the URL to the establishment detail page. I tested it to be sure and there is no fallback, if you turn off javascript the page no longer functions. This will make it difficult/impossible to identify a list of establishments on the page without identifying and replicating the function to build the URL. The difficulty level on this task just went up quite a bit.

ttavenner commented 8 years ago

Closing this. We have started work on a new code base to handle this issue and I will be creating new issues for specific tasks.