BlankerL / DXY-COVID-19-Data

2019新型冠状病毒疫情时间序列数据仓库 | COVID-19/2019-nCoV Infection Time Series Data Warehouse
https://lab.isaaclin.cn/nCoV/
MIT License
2.15k stars 708 forks source link

Confirmed cases for 长春市 and 吉林市 were flipped #110

Closed beansrowning closed 2 years ago

beansrowning commented 2 years ago

This doesn't seem like something you can fix on your end, but I wanted to bring to your/others awareness that I noticed the values of confirmed cases reported in Changchun and Jilin City were flipped on 3/15, which has thrown off cumulative cases since.

Per NHC on 3/15 (http://www.nhc.gov.cn/xcs/yqtb/202203/8d8d2035b3884fcfb734e0ab07bede79.shtml):

Changchun: +460 Jilin City: +2601

Per DXY scrape: image

Also, just a thank you for your continued support of this service. I started using these data early in 2020, and they've continued to be valuable some two years later. I've used them to provide situational awareness to US Government and colleagues internationally. I've found no better publicly available source of ADM2-level data to date.

Cheers!

qianxliu commented 2 years ago

Surely John Hopkins data about China Covid is definably inaccurate As a Chinese, I'm also working for a valuable data without need for manual production But I'm afraid that I may build it by hand.

qianxliu commented 2 years ago

I guess John Hopkins data totally different with real China covid data is a big reason for wrong assessment of China's novel coronavirus pneumonia.

beansrowning commented 2 years ago

JHU works well at provincial level (ADM 1) but there is no Chinese prefecture/city level data (ADM 2) that I'm aware of.

qianxliu commented 2 years ago

I compare JHU with official post already, and they are totally mismatched. and I use JHU data to analyst, find it very weird in modeling China COVID19. data source: https://github.com/CSSEGISandData/COVID-19

about 1200 in recent days, but as I know, the data is totally different. https://voice.baidu.com/act/newpneumonia/newpneumonia/?from=osari_aladin_banner

I still couldn't know how to use this data repos, but may you can try to do a comparison

qianxliu commented 2 years ago

If want to get Chinese data must know Chinese and spend time to arrange it. Chinese scholars generally do not do these things.

qianxliu commented 2 years ago

@beansrowning I have found JHU data is basically correct. my error data comes from this: https://github.com/Kamaropoulos/COVID19Py which data sources comes from JHU but has a bad implementation. This package has a terrible bug which needs to be fixed.

Finally solved the case

BlankerL commented 2 years ago

Hi @beansrowning,

Thanks a lot for your report! I have checked your NHC post on 3/15, the database and the dumped CSV/JSON file, and you are right. The values of Changchun and Jilin City on the timestamp 1647311180134 (Tue Mar 15, 2022, 02:26:20 GMT+0000 or Tue Mar 15, 2022, 10:26:20 GMT+0800) are flipped.

These data are directly scraped from DXY, which, as far as I know, manually collects and updates on their site and might have some wrong values, especially in the early days.

The purpose of the crawler and this data warehouse is to maintain a time-series database, so I prefer not to modify the data directly in the database. But I will address this problem in the Noise Data part on the README file for others' awareness. Thanks again for your contribution.

Also, thank you for letting me know the dataset really helped in your research. Being helpful in scientific research is my main purpose in building and maintaining this dataset.

Cheers!