covidatlas / li

Next-generation serverless crawler for COVID-19 data
Apache License 2.0
57 stars 33 forks source link

Add scraper for Colombia #410

Closed jzohrab closed 4 years ago

jzohrab commented 4 years ago

Original issue https://github.com/covidatlas/coronadatascraper/issues/682, transferred here on Monday Apr 06, 2020 at 01:30 GMT


Location name

Colombia (COL)

Source URL

http://www.ins.gov.co/Noticias/Paginas/Coronavirus.aspx (National Institute of Health). But see below.

Notes/comments

The source URL above has a bunch of Infograms embedded. Each one can be opened in a tab, and then you can snoop the data sources using Chrome's network inspector.

Summary data

https://infogram.com/api/live/flex/5eb73bf0-6714-4bac-87cc-9ef0613bf697/c9a25571-e7c5-43c6-a7ac-d834a3b5e872?

The data is in an array of HTML chunks, e.g.:

[
"<font face=\"Montserrat, sans-serif\" color=\"#ed1e79\" style=\"font-size: 22px;\"><b>1.485</b></font>",
"<font face=\"Montserrat, sans-serif\" color=\"\" style=\"font-size: 13px;\">Casos <b>Confirmados en Colombia</b></font>",
"boyPath"
],

Shows 1,485 confirmed cases.

Number of cases by "departamento" (state)

https://infogram.com/api/live/flex/5e0d85ae-48a4-4899-a679-5ee9aab66d4b/266e0a29-b843-4891-9da4-12325531507b?

Status of positive cases (e.g. hospitalized, deceased, etc.)

https://infogram.com/api/live/flex/de2e4d7c-f649-409e-a874-a7f3f6033ef1/f9098f49-e26a-4843-8291-e78cb0d9aef0?

Breakdown by gender and age

https://infogram.com/api/live/flex/de2e4d7c-f649-409e-a874-a7f3f6033ef1/406f17bb-9a08-4b76-9984-63941d87a790?

List of cases

https://infogram.com/api/live/flex/bc384047-e71c-47d9-b606-1eb6a29962e3/664bc407-2569-4ab8-b7fb-9deb668ddb7a?

This is a table structured as an array of rows. The header row is: "ID de caso" - case ID "Fecha de diagnóstico" - date of diagnosis "Ciudad de ubicación" - city "Departamento o Distrito" - state or district (assuming that's a county) "Atención*" - status. They note that "recuperado" (recovered) requires two negative tests. "Edad" - age "Sexo" - gender "Tipo" - type of case. "Importado" (which they define as having come from a country with confirmed COVID-19 cases) or "relacionado" (confirmed to have had contact with someone who has COVID-19) "País de procedencia" - Country considered the source of the infection for this patient

Status can be: "casa" - self-quarantining at home (I'm assuming here based on what I've seen in other Latin American countries. "fallecido" - deceased "recuperado" - recovered; requires two negative tests to confirm. "hospital" - hospitalized "hospital UCI" - intensive care

Time series and test data

https://infogram.com/api/live/flex/bc384047-e71c-47d9-b606-1eb6a29962e3/523ca417-2781-47f0-87e8-1ccc2d5c2839?

One series is total cases, deaths, and recoveries, the other one is a weekly count of tests processed and test backlog.

Additional sources

I also found some open sources in the arcGIS hub - https://hub.arcgis.com/search?categories=covid-19&collection=Dataset

You can get JSONs out of all of these.

The license on each of these implies that they are from the same government entity as the Infograms above.

There are different dataset hashes but evidently choosing which data you want is only a function of the number after the underscore.

Source of cases

https://hub.arcgis.com/datasets/esri-colombia::colombia-covid19-coronavirus-procedencia-de-los-casos/data?selectedAttribute=CASOS CSV: https://opendata.arcgis.com/datasets/3a505d6969c149f98b122fb0a6fd1e7e_4.csv

Number of confirmed cases by state

https://hub.arcgis.com/datasets/esri-colombia::colombia-covid19-coronavirus-departamento/data CSV: https://opendata.arcgis.com/datasets/ed48c4ce9ca94d5499f1c327f8f532f1_1.csv

Cases by municipality

https://hub.arcgis.com/datasets/esri-colombia::colombia-covid19-coronavirus-municipio/data CSV: https://opendata.arcgis.com/datasets/53beb24d21f146c38a42db63c92e3460_0.csv

This is the one we want; includes population, population density, total cases, total active cases, total deaths, and total recovered.

Case details

https://hub.arcgis.com/datasets/esri-colombia::colombia-covid19-coronavirus-detalle-de-los-casos/data CSV: https://opendata.arcgis.com/datasets/0e14099fac45422896d50bd52292faea_3.csv

Time series

For the country as a hole; includes new/total cases, deaths, and recoveries. https://hub.arcgis.com/datasets/esri-colombia::colombia-covid19-coronavirus-casos-diarios/data CSV: https://opendata.arcgis.com/datasets/782122624f364fbdbd7e287b96c4a358_6.csv