CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.12k stars 18.39k forks source link

Test number #277

Open raftgs opened 4 years ago

raftgs commented 4 years ago

Hi,

it could be interesting to keep track of the total number of tests that have been made or the number of tested people.

Best,

Raffaele

remondevries commented 4 years ago

This would be a great feature, but might be challenging to get this data from a public source or multiple sources.

raftgs commented 4 years ago

Thanks @remondevries, Do you know how data are collected right now? From a single or multiple sources? IMHO it's an info that WHO should officially ask to all countries. For the few data available, it's something that could help to avoid panic and help people to realize that precautions are helpful and so the social impact is not negligible; anyway it's not clear at all if those tests have been made on different people or multiple tests have been executed on the same person.

pascalwhoop commented 4 years ago

Yes the number of tests also puts discovered cases into perspective. Culture varies greatly within Europe. Italy as an example is a very health-conscious country and they will test much more rigorously in the first few days than say Belgium or Germany. Hence, the low number of people tested could hint at a large number of undiscovered cases in countries.

remondevries commented 4 years ago

I found this information regarding te tests but it also says

  • as of March 1. On March 2, the "Total tested" figure was removed from CDC's website.

But it does list interesting links of sources who do keep track of testing.

Link here

I have not looked too deeply into this but I still think its a great idea because it puts the data into perspective for people. It makes your dataset more compleet.

Maybe it helps someone!

raftgs commented 4 years ago

Thanks a lot for the links. Concerning the Italy, the Protezione Civile made this dashboard available:

http://opendatadpc.maps.arcgis.com/apps/opsdashboard/index.html#/b0c68bce2cce478eaac82fe38d4138b1

On the bottom right, data in CSV format are accessible and include the total tested info.

raftgs commented 4 years ago

Data for France (up to 5 March), page 3 of the PDF

https://www.santepubliquefrance.fr/content/download/234789/2523105

The total number of tests is: 6087 negatives, 613 positives;

A positivity rate of 10-15% as well as in Italy, higher than other. I'm trying to check other countries as well.

raftgs commented 4 years ago

Switzerland, 2020/03/08:

"Suspect cases tested negative (all laboratories combined): more than 4000 persons"

https://www.bag.admin.ch/bag/en/home/krankheiten/ausbrueche-epidemien-pandemien/aktuelle-ausbrueche-epidemien/novel-cov.html

raftgs commented 4 years ago

UK (updated daily)

https://www.gov.uk/guidance/coronavirus-covid-19-information-for-the-public#number-of-cases

remondevries commented 4 years ago

@raftgs It might be an good idea to open a new ticket to collect in it all the URL's from different government sources? I might be able to create a simple webpage with a list of all these URL's.

pascalwhoop commented 4 years ago

@remondevries @raftgs what do you two think about pulling this effort out of this repo and making a small github org where we can host

CSSEGISandData commented 4 years ago

Thank you for the suggestion. We will keep that idea in mind when building out future capabilities.

abhishekamit commented 4 years ago

+1

The biggest problem I have understanding the data is the varying levels of testing since the CFR obviously changes a lot if a country tests broadly (as Korea does), or only tests patients with moderate or worse symptoms. Adding this to the dataset would make it easier to compare between countries or even states in the US.

andyljones commented 4 years ago

I've scraped historical test counts for five countries.

Sources are country-specific:

I originally found these sources by following OurWorldInData's links.

Tangentially, this is the plot I was looking to make:

image

"""Scrapes historical UK coronavirus test counts"""

import pandas as pd
import requests
import json

URL = 'https://www.gov.uk/guidance/coronavirus-covid-19-information-for-the-public'

def index(target):
    url = f'https://web.archive.org/cdx/search/cdx'
    r = requests.get(url, params={'url': URL, 'output': 'json'})
    r.raise_for_status()

    raw = json.loads(r.content)
    return (pd.DataFrame(raw[1:], columns=raw[0])
                .assign(timestamp=lambda df: pd.to_datetime(df.timestamp, format='%Y%m%d%H%M%S'))
                .assign(date=lambda df: df.timestamp.dt.normalize()))

def snapshot(target, timestamp=None):
    url = f'http://web.archive.org/web/{timestamp:%Y%m%d%H%M%S}/{target}'
    r = requests.get(url)
    r.raise_for_status()
    return r.content 

def page_contents(url):
    idx = index(url)
    snaps = {}
    for date, row in idx.groupby('date').last().iterrows():
        snaps[date] = snapshot(url, row.timestamp)

    for _, row in pd.Series(snaps).str.extract('(As of.*)<')[0].iteritems():
        print(row)

if __name__ == '__main__':
    page_contents(URL)
bhack commented 4 years ago

Italy official data is the last column at: https://github.com/pcm-dpc/COVID-19/tree/master/dati-andamento-nazionale