gnomon- / ontario-coronavirus-counter

Generates diffs of https://www.ontario.ca/page/2019-novel-coronavirus , mostly for wolever
1 stars 0 forks source link

Question! What do you use to get the HTML? #1

Open jeromegv opened 4 years ago

jeromegv commented 4 years ago

Hello!

What do you use to get the HTML of https://www.ontario.ca/page/2019-novel-coronavirus ? I cannot get it through WGET because the page is made through javascript

Any other tool you suggest?

Thank you!

gnomon- commented 4 years ago

I use curl, but wget should be able to do the job too. I used a browser instrumentation tool to check for calls to JSON resources during the page load and found the URL configured at https://github.com/gnomon-/ontario-coronavirus-counter/blob/master/ontario-coronavirus-counter#L19 , https://api.ontario.ca/api/drupal/page%2F2019-novel-coronavirus?fields=nid,field_body_beta,body

(It's a little bit unfortunate that the content of that JSON payload is an HTML fragment encoded as a string, but, well... ¯\_(ツ)_/¯)

Does that help you out?

jeromegv commented 4 years ago

It did! Thank you! I actually ended up finding it in your code after opening the issue. Not sure why I didn't think of looking through the JSON callbacks.

I'm keeping a daily snapshot of this HTML output over here: https://github.com/jeromegv/covid_data/tree/master/ontario/status_of_cases_in_ontario

Do you know if this API is documented somewhere?