globaldothealth / monkeypox

Mpox 2022 repository
Other
175 stars 36 forks source link

Challenges in data curation #127

Closed aimeehan1 closed 1 year ago

aimeehan1 commented 2 years ago

Comments from discussion 2022-07-13 Errors.

Changes in reporting formats.

aimeehan1 commented 2 years ago

On 2022-07-26, G.h observed discrepancy in U.S. confirmed case data between different webpages within CDC's website:

U.S. Map & Case Count page at 3,487 Global Map page at 3,846 Both pages reporting data as of 2022-07-25.

https://www.cdc.gov/poxvirus/monkeypox/response/2022/us-map.html https://www.cdc.gov/poxvirus/monkeypox/response/2022/world-map.html

U S  Map Global Map

lisphilar commented 2 years ago

Thank you for providing the dataset!

Sorry for jumping in, but I tried to create a pandas.DataFrame with cumulative number of confirmed/recovered/fatal cases using your linelist data. https://gist.github.com/lisphilar/23d23f8692f70f2663a6c4890758a7ab

I assumed the followings. Is my understanding correct?

Is it possible to provide recovered/fatal data as well as confirmed? Total populaton and cumulative number of confirmed/recovered/fatal cases are very useful for data analysis. I developed a Python library (COVID-19 data, named CovsirPhy) and analysed them with math models.

jim-sheldon commented 2 years ago

@lisphilar You're welcome!

Please do not apologize for jumping in; we made our work open source because we want your input!

Would you kindly open a new issue in this repository for the problem you described? This allows us to keep all "epics", features, and bugfixes discrete.

I would also refer you to our data dictionary, which might help answer some of your questions.

lisphilar commented 2 years ago

@jim-sheldon Thank you for your quick response!

I just have created four issues #177 #178 #179 #180 and I'm looking forward to having discussion with you and your team there.

abhidg commented 1 year ago

Line list is discontinued as of 2022-09-22