datadista / datasets

Fuente de datos de los reportajes y proyectos de periodismo de investigación y datos de DATADISTA
GNU Affero General Public License v3.0
328 stars 264 forks source link

Error in dataset nacional_covid19_rango_edad.csv #37

Closed mvanreek closed 4 years ago

mvanreek commented 4 years ago

Hi Datadista,

In the first place thanks for the great work.

I have made a visualization (Power BI report) of the data in dataset nacional_covid19_rango_edad.csv, see:

https://worktimesheet2014.blogspot.com/2020/03/coronavirus-covid-19-in-spain-power-bi.html

and I think this dataset has an error: the value '43739' for Age-group should probably be 10-19 (years), as for most other Age-groups I see 6 rows in the table and for this one just 4. Is that correct?

BTW: if you want to add my blog-post to the page where you list all websites that make use of your dataset, great.

saludos,

Maarten van Reek (Dutchman living in Madrid)

4tikhonov commented 4 years ago

Hi @mvanreek, thanks, it looks very interesting!

Can you probably connect data from the Netherlands https://github.com/J535D165/CoronaWatchNL and Italy https://github.com/pcm-dpc/COVID-19? I'm trying to get everything in the standardized format ready for linkage.

adelgadob commented 4 years ago

Done! We are working in a meta-data doc Added your post to the readme.md ¡Thanks!

mvanreek commented 4 years ago

I still see the error in : nacional_covid19_rango_edad.csv, see lines 55 and 88 , which still have '43739' for column rango_edad

2020-03-24,43739,hombres,96,7,0,0 2020-03-25,43739,hombres,94,6,0,0

adelgadob commented 4 years ago

Fixed!

mvanreek commented 4 years ago

thanks, I'll update my PowerBI-report

mvanreek commented 4 years ago

I see another error in this file, fallecidos 25/3 is 918 according to file, but this should be 738, which you can read e.g. here: https://www.redaccionmedica.com/secciones/sanidad-hoy/coronavirus-directo-ultima-hora-del-miercoles-25-de-marzo-5511

but also if you look in your source (PD), you can see in yesterdays's update: total #fallecidos = 2696 and in todays update this number is 3434, so diff. ('delta' deaths reported yesterda and today) is 738.

adelgadob commented 4 years ago

This data is correct. nacional_covid19_rango_edad.csv only show data obtained from the analysis of 21,872 reported cases with age information and 21,851 with age and sex information. All cases are in this other dataset: nacional_covid19.csv

Please, check original PDF with notes and tables: https://www.mscbs.gob.es/profesionales/saludPublica/ccayes/alertasActual/nCov-China/documentos/Actualizacion_55_COVID-19.pdf

mvanreek commented 4 years ago

Ok, I see in table 2 indeed 918 deaths, so this is not the number of deaths of last reporting-date, but a total number (not a 'delta' but a cumulative total) for all the aprox 20k cases with gender/age info, if I understand you correctly, right?

adelgadob commented 4 years ago

Yes, exactly

mvanreek commented 4 years ago

ok, thanks. To understand the numbers better it would be really great if you have your meta-data asap.

mvanreek commented 4 years ago

ok, I just saw that you explained this in the Readme. Just one more thing.. So the total of ALLl the infected people (gender/age-group known or not) is 47k, and deaths 3.4k (see table 1 in PDF). And of the people whose gender/age-group is known (see table 2 of PDF), #infected is 21.8k (a bit less than 50% of ALL) and #deaths 918 (aprox 30% of ALL), Right? I would have expected more deaths in second grpup, so also aprox 50% of ALL, so that would be aprox. 1.7k deaths (so aprox 50% more of the #918 in table 2) , but maybe the 'sample-group' of table 2 is not so representative, meaning, maybe in the ALL-group of table 1, %elder people is much higher than in the Sample-group of table 2.

mvanreek commented 4 years ago

BTW: I also made a PowerBI report "Corona-in-Netherlands' based on this open data set: https://github.com/J535D165/CoronaWatchNL

and for embedded report, see:

https://worktimesheet2014.blogspot.com/2020/03/coronavirus-in-netherlands-embedded.html

mvanreek commented 4 years ago

Hi @mvanreek, thanks, it looks very interesting!

Can you probably connect data from the Netherlands https://github.com/J535D165/CoronaWatchNL and Italy https://github.com/pcm-dpc/COVID-19? I'm trying to get everything in the standardized format ready for linkage.

To 4tikhonov: where can I find more about the 'standardized format' of COVID19-data? (which will have all fields in English I suppose/hope, what is now not the case for the Dutch/Spanish/Italian open data