PDF scraping will be a more intensive task, so we're tabling that until we see if WHO AFRO has the information available from the PDFs in another format.
In the meantime the scope of this issue is to pull the data available from the GIS dashboard, incorporating the code as a function in this repository that can be re-used. The data are not so large in scale they need to be cached and updated, so just pulling all at once should be fine.
The WHO AFRO GIS dashboard is an app built on the ArcGIS web platform. These generally call ArcGIS server endpoints, which have a common API, allowing extracting data in formats beyond even what the app calls. Looking at the network calls of the app, this server is at https://services.arcgis.com/5T5nSi527N4F7luB/, and the layers called are:
I believe the first layer contains the data we are interested in but it's worth checking discrepancies with the others and the most recent PDF reports.
At the page at each of these endpoints you can construct a query to fetch data. To fetch all data in the layers
Set the query to 1=1
Set "fields" to * to get all values
Set "return geometry" to False if you don't need the data-intensive GIS shapes
Select the output format as JSON (but HTML to browse the output first)
Use the "GET" query type (rather than "POST")
This will generate a query URL you can reuse.
Only 2000 features will be fetched at once, so fetching repeatedly with a different "Result Offset" values is probably needed to get all the records.
For each event, we want to extract:
Country
Event
Date notified to WCO
Start of reporting period
End of reporting period
Total cases
Confirmed Cases
Deaths
I don't know how repeat reports related to the same event are handled in the database - we may need a slightly more complex schema for this, particularly if we ultimately scrape the weekly PDFs. Consult @sebaum about any particulars of the most useful output structure.
@sebaum requested that we support extracting data from WHO AFRO's health emergency reports. These reports are found in two places:
PDF scraping will be a more intensive task, so we're tabling that until we see if WHO AFRO has the information available from the PDFs in another format.
In the meantime the scope of this issue is to pull the data available from the GIS dashboard, incorporating the code as a function in this repository that can be re-used. The data are not so large in scale they need to be cached and updated, so just pulling all at once should be fine.
The WHO AFRO GIS dashboard is an app built on the ArcGIS web platform. These generally call ArcGIS server endpoints, which have a common API, allowing extracting data in formats beyond even what the app calls. Looking at the network calls of the app, this server is at https://services.arcgis.com/5T5nSi527N4F7luB/, and the layers called are:
I believe the first layer contains the data we are interested in but it's worth checking discrepancies with the others and the most recent PDF reports.
At the page at each of these endpoints you can construct a query to fetch data. To fetch all data in the layers
1=1
*
to get all valuesThis will generate a query URL you can reuse.
Only 2000 features will be fetched at once, so fetching repeatedly with a different "Result Offset" values is probably needed to get all the records.
For each event, we want to extract:
I don't know how repeat reports related to the same event are handled in the database - we may need a slightly more complex schema for this, particularly if we ultimately scrape the weekly PDFs. Consult @sebaum about any particulars of the most useful output structure.