equivalentideas / westconnex_M4_East_Air_Quality_Monitoring

WestConnex M4 East - Air Quality Monitoring data
1 stars 1 forks source link

Scrape JSON files instead of HTML pages #29

Closed henare closed 6 years ago

henare commented 6 years ago

We're currently using PhantomJS to scrape the HTML pages with JS enabled. Those pages are actually populated by JSON files, e.g. http://airodis.ecotech.com.au/AirodisReport/JobSubmit.ashx?report=West+Connex%5cWebsite%5cChandos+St+summary.xml&format=json which should be easier and more reliable to fetch.

equivalentideas commented 6 years ago

@henare for notes, this is what I came up with on the plane https://github.com/equivalentideas/westconnex_M4_East_Air_Quality_Monitoring/pull/57

The tests don't actually work, but you can see where I was going with it for interests sake.

henare commented 6 years ago

@equivalentideas thanks! Interesting we both came up with similar ideas of removing the auto-discovery of feed locations and replacing that with a static lookup :smile:

Switching what I've done in #56 to use VCR instead of a manually downloaded fixture file might be an improvement to do in the future?

equivalentideas commented 6 years ago

Switching what I've done in #56 to use VCR instead of a manually downloaded fixture file might be an improvement to do in the future?

Yep that sounds good 👍

henare commented 6 years ago

Closed by merging #56.