Open raftgs opened 4 years ago
This would be a great feature, but might be challenging to get this data from a public source or multiple sources.
Thanks @remondevries, Do you know how data are collected right now? From a single or multiple sources? IMHO it's an info that WHO should officially ask to all countries. For the few data available, it's something that could help to avoid panic and help people to realize that precautions are helpful and so the social impact is not negligible; anyway it's not clear at all if those tests have been made on different people or multiple tests have been executed on the same person.
Yes the number of tests also puts discovered cases into perspective. Culture varies greatly within Europe. Italy as an example is a very health-conscious country and they will test much more rigorously in the first few days than say Belgium or Germany. Hence, the low number of people tested could hint at a large number of undiscovered cases in countries.
I found this information regarding te tests but it also says
- as of March 1. On March 2, the "Total tested" figure was removed from CDC's website.
But it does list interesting links of sources who do keep track of testing.
I have not looked too deeply into this but I still think its a great idea because it puts the data into perspective for people. It makes your dataset more compleet.
Maybe it helps someone!
Thanks a lot for the links. Concerning the Italy, the Protezione Civile made this dashboard available:
http://opendatadpc.maps.arcgis.com/apps/opsdashboard/index.html#/b0c68bce2cce478eaac82fe38d4138b1
On the bottom right, data in CSV format are accessible and include the total tested info.
Data for France (up to 5 March), page 3 of the PDF
https://www.santepubliquefrance.fr/content/download/234789/2523105
The total number of tests is: 6087 negatives, 613 positives;
A positivity rate of 10-15% as well as in Italy, higher than other. I'm trying to check other countries as well.
Switzerland, 2020/03/08:
"Suspect cases tested negative (all laboratories combined): more than 4000 persons"
@raftgs It might be an good idea to open a new ticket to collect in it all the URL's from different government sources? I might be able to create a simple webpage with a list of all these URL's.
@remondevries @raftgs what do you two think about pulling this effort out of this repo and making a small github org where we can host
Thank you for the suggestion. We will keep that idea in mind when building out future capabilities.
+1
The biggest problem I have understanding the data is the varying levels of testing since the CFR obviously changes a lot if a country tests broadly (as Korea does), or only tests patients with moderate or worse symptoms. Adding this to the dataset would make it easier to compare between countries or even states in the US.
I've scraped historical test counts for five countries.
Sources are country-specific:
I originally found these sources by following OurWorldInData's links.
Tangentially, this is the plot I was looking to make:
"""Scrapes historical UK coronavirus test counts"""
import pandas as pd
import requests
import json
URL = 'https://www.gov.uk/guidance/coronavirus-covid-19-information-for-the-public'
def index(target):
url = f'https://web.archive.org/cdx/search/cdx'
r = requests.get(url, params={'url': URL, 'output': 'json'})
r.raise_for_status()
raw = json.loads(r.content)
return (pd.DataFrame(raw[1:], columns=raw[0])
.assign(timestamp=lambda df: pd.to_datetime(df.timestamp, format='%Y%m%d%H%M%S'))
.assign(date=lambda df: df.timestamp.dt.normalize()))
def snapshot(target, timestamp=None):
url = f'http://web.archive.org/web/{timestamp:%Y%m%d%H%M%S}/{target}'
r = requests.get(url)
r.raise_for_status()
return r.content
def page_contents(url):
idx = index(url)
snaps = {}
for date, row in idx.groupby('date').last().iterrows():
snaps[date] = snapshot(url, row.timestamp)
for _, row in pd.Series(snaps).str.extract('(As of.*)<')[0].iteritems():
print(row)
if __name__ == '__main__':
page_contents(URL)
Italy official data is the last column at: https://github.com/pcm-dpc/COVID-19/tree/master/dati-andamento-nazionale
Hi,
it could be interesting to keep track of the total number of tests that have been made or the number of tested people.
Best,
Raffaele