covid19datahub / COVID19

A worldwide epidemiological database for COVID-19 at fine-grained spatial resolution
https://covid19datahub.io
GNU General Public License v3.0
252 stars 92 forks source link

articles/doc/data #72

Closed utterances-bot closed 4 years ago

utterances-bot commented 4 years ago

Dataset Documentation • COVID-19 Data Hub

https://covid19datahub.io/articles/doc/data.html

eholmdahl1 commented 4 years ago

Hi, what is the source of the prevalence data (marked here as COVID-19 Variables) for the administrative_area_level_3 level/the U.S. counties. Are the policy variables generalized from the state level to the county level or are those policies updated for each county? Thank you in advance!

eguidotti commented 4 years ago

Hello, the data sources are available here. If you don't see USA level 3, it means there is a fallback to the rows with empty iso and level. In particular, U.S. counties data are from JHU CSSE. Policies for each county are inherited from country-level policies from OxCGRT. This may be inaccurate. Please feel free to suggest other options/databases for fine-grained policies in US. Thanks!

kenshermock commented 4 years ago

Hello, when administrative_area_level_3 is equal to "Out of X", where X = state two character initials, what exactly does that mean? Thank you and thanks for your efforts with this wonderful site!

eguidotti commented 4 years ago

Hello, thanks for your feedback :) The "Out of X" names are from JHU CSSE. The only information I was able to get is

Out of [State], US: UID = 840 (country code3) + 800XX (state FIPS code). Ranging from 8408001 to 84080056.

Maybe you could try to open an issue and ask them directly?

kenshermock commented 4 years ago

Thanks for your reply, Emanuele. My trainee and I have noticed an idiosyncrasy with the data. When we sum the total U.S. cases and deaths using today’s file, we get different results for each of the three source: CSSE, covid19datahb data-2, and covid19datahb data-3. Here are the results from the three data sets:

csse_covid_19_daily_reports:

stats | cases deaths ---------+-------------------- sum | 1955421 110832

https://storage.covid19datahub.io/data-2.csv :

stats | cases deaths ---------+-------------------- sum | 1947077 104885

https://storage.covid19datahub.io/data-3.csv :

stats | cases deaths ---------+-------------------- sum | 1936243 109291

Any insight you could provide would be helpful.

Thanks,

Ken

Kenneth M. Shermock, PharmD, PhD, FASHP

Director, Center for Medication Quality and Outcomes || The Johns Hopkins Health System Associate Director, Center for Drug Safety and Effectiveness || The Johns Hopkins Bloomberg School of Public Health

Assistant Professor, part time General Internal Medicine || The Johns Hopkins School of Medicine Epidemiology || The Johns Hopkins Bloomberg School of Public Health

600 North Wolfe Street Carnegie 180 Baltimore, MD 21287 410-502-7674 (Desk) 410-955-0287 (Fax) ken@jhmi.edumailto:ken@jhmi.edu

From: Emanuele Guidotti notifications@github.com Reply-To: covid19datahub/COVID19 reply@reply.github.com Date: Friday, June 5, 2020 at 5:13 PM To: covid19datahub/COVID19 COVID19@noreply.github.com Cc: "ken@jhmi.edu" ken@jhmi.edu, Comment comment@noreply.github.com Subject: Re: [covid19datahub/COVID19] articles/doc/data (#72)

Hello, thanks for your feedback :) The "Out of X" names are from JHU CSSEhttps://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data. The only information I was able to get is

Out of [State], US: UID = 840 (country code3) + 800XX (state FIPS code). Ranging from 8408001 to 84080056.

Maybe you could try to open an issuehttps://github.com/CSSEGISandData/COVID-19/issues and ask them directly?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/covid19datahub/COVID19/issues/72#issuecomment-639821389, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB4MMYDSHHBZ7ZYX2FYFA33RVFNTZANCNFSM4NQBPBLA.

eguidotti commented 4 years ago

Thanks for the check, Ken. You can find the data sources we use at https://storage.covid19datahub.io/src.csv

In particular, USA level 2 data are from https://covidtracking.com Level 3 is from JHU CSSE (timeseries).

The mismatch is likely to be due to:

Please note that in any case it is not recommended to aggregate lower-level data to obtain top-level data. There are usually missing values and missing cities/states that make the total count incorrect. That's why we rely on different sources for different levels.

I wouldn't double-check on the total counts but directly on individual observations. Level 2 may be slightly different due to the different data sources. Level 3 should be exactly the same for the single cities/counties.

Hope this helps, Emanuele

aoschwartz7 commented 3 years ago

Hi, what sources are you using for deaths, confirmed, tests, and recovered at the country level?

aoschwartz7 commented 3 years ago

Sorry, I found the sources you cite at https://github.com/covid19datahub/COVID19/blob/master/inst/extdata/src.csv

eguidotti commented 3 years ago

Exactly that file yes. Countries that are not specified use the data sources with empty ISO code. I.e. Johns Hopkins University for for confirmed, deaths, and recovered; Our World in Data for tests.

You may also want to use the following to get the data sources easily

ktbaek commented 3 years ago

It seems data for Brazil is only for one state (Espirito Santo), is that correctly understood?

eguidotti commented 3 years ago

At state level yes, only for Espirito Santo. But you can also find national level data for Brazil. Can you suggest some data providers for fine grained Brazilian data? Thanks!

ktbaek commented 3 years ago

Ok, thank you. No, I don't know any good data source unfortunately.

ktbaek commented 3 years ago

Hi again, Johns Hopkins's daily reports (but not the timeseries) have state-level data for Brazil. https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports

eguidotti commented 3 years ago

Maybe we can collect the daily reports to contruct the timeseries? This looks similar to #135