Code4PuertoRico / covid19-response

The central repository for all COVID-19 efforts
MIT License
1 stars 0 forks source link

Document PR covid 19 dashboards #2

Closed froi closed 4 years ago

froi commented 4 years ago

There appears to be two separate PR Covid19 dashboards for PR

Up to now the differences seem to be visual and in data structure. The data seem to be the same.

We have no way in knowing if both are "official". The domains are both under the bioseguridad umbrella.

froi commented 4 years ago

The dashboards seem to have different data sources:

froi commented 4 years ago

There might be a possibility that they are A/B testing. The data seem to be the same with some tweaks to the UI of the dashboards.

Sadly there is no way of knowing. The official URL still gives a 404

image

Without this URL working we have no idea which we are supposed to keep looking at. If the health department has changed the domain for the dashboard we have no knowledge of the new one.

sanchobarriga commented 4 years ago

Each Dashboard link is coupled with a REST content link, identified by the Widget ID.

For example, for dashboard 2, the ID is: 3bfb64c9a91944bc8c41edd8ff27e6df, ripped from the end of the url. It's REST content link would be: https://www.arcgis.com/sharing/rest/content/items/3bfb64c9a91944bc8c41edd8ff27e6df/data

In Python, this Rest link can easily be called using the Beautiful Soup and json modules as follows.

url = 'https://www.arcgis.com/sharing/rest/content/items/3bfb64c9a91944bc8c41edd8ff27e6df/data' from urllib.request import Request, urlopen from bs4 import BeautifulSoup import json req = Request(url, headers={'User-Agent': 'Mozilla/5.0'}) html = urlopen(req) soup = BeautifulSoup(html)
output = json.loads(soup.text)

My limited understanding of ARCGIS data structure led to my extraction pipeline being a very tailored one, snooping very slowly through what was made available. Eventually I identified the "widgets" key in the ouput json contained information of great interest.

In the case of Dashboard 2, there is a MapWidget that contains most of the tables of interest. This is not the case of Dashboard 1, at least not as well as I could tell.

froi commented 4 years ago

Thanks @sanchobarriga (nice username BTW 😆 )

I got a couple of suggestions for you.

Since the endpoint returns JSON to begin with you might want to use the Requests library. It'll give you a cleaner way to work with JSON payloads.

Example:

import requests

url = 'https://services5.arcgis.com/klquQoHA0q9zjblu/arcgis/rest/services/Datos_Totales/FeatureServer/0/query?f=json&where=1%3D1&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=*&outSR=102100&resultOffset=0&resultRecordCount=50&cacheHint=true'
response = requests.get(url)
data = response.json() # Will return a Python dictionary from the JSON payload
github-actions[bot] commented 4 years ago

This Issue is being marked as Stale because it has 30 days without any interaction. CC: @code4puertorico/covid19