Closed edend10 closed 1 year ago
There is something wrong with the dataset, The entire dataset has 117k records, there are 550k cases, if you select on country = 'United States' it's only got 11,364 records and there are 86,242 cases in the USA so far. I hope this can be fixed, I really need the USA stats for my company. We're using this data to determine which offices to close, we are an essential business.
Thanks for flagging - we're looking into this. There is one processing step that might have led to some cases not being present when using those filters. I will caution however that this dataset is intended to be as comprehensive as is feasible for our team, with as much specific metadata available, rather than be aligned one-to-one with the total global cases. I would suggest looking at the Johns Hopkins Github https://github.com/CSSEGISandData/COVID-19 for a system that is engineered to track cumulative counts, without the metadata of age, sex, outbreak time milestones, or more specific geography. We will retain as much as is feasible for us to do so.
@beoutbreakprepared I'm fine with the entire dataset, we have global offices as well, just concentrating on USA for now. But having said that, the entire dataset only has 117k records, it should have well over 550k records and should be 600k+ after today.
If it helps diagnose the issue, the state of NY in the United States seems to have no cases confirmed for certain dates, while on other sites their reporting of cases seems to be pretty consistent day-to-day.
is it just my copy of the 'latestdata.csv' that only has 436,549 records of the 2,479,498 cases? Is there another section? ... the latest copy is just a fifth of the cases
Is there anymore data beyond june 16?
@calremmel is looking into why the data isn't up to date.
Hi all, update on this: the line list data source that feeds latestdata.csv is not currently being updated, so the current file is as up to date as it is going to be until that changes.
My understanding is that we'll be migrating to some other sources soon. I don't know what that will mean for accessing this particular file where it is currently, but for the time being, the most recent lines are from mid-June.
Bummer - thanks for the update!
Get Outlook for iOShttps://aka.ms/o0ukef
From: calremmel notifications@github.com Sent: Friday, July 31, 2020 5:22:30 PM To: beoutbreakprepared/nCoV2019 nCoV2019@noreply.github.com Cc: tnjcook timnjencook@live.com; Comment comment@noreply.github.com Subject: Re: [beoutbreakprepared/nCoV2019] Is latestdata.csv incomplete/outdated? (#44)
Hi all, update on this: the line list data source that feeds latestdata.csv is not currently being updated, so the current file is as up to date as it is going to be until that changes.
My understanding is that we'll be migrating to some other sources soon. I don't know what that will mean for accessing this particular file where it is currently, but for the time being, the most recent lines are from mid-June.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/beoutbreakprepared/nCoV2019/issues/44#issuecomment-667177681, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APDIRWHUG2ARXCQHOSFZRQTR6LOTNANCNFSM4LUSNFIQ.
First of all, great work and thank you for providing this data!
I'm doing a simple aggregation on province, city and noticing the number isn't right for NYC. (using python and pandas)
Aggregated count in NYC comes down to 2469, whereas today cases are reported to be 20K+
On https://www.healthmap.org/covid-19/ which references your data, they show 17K which is closer to the reported numbers (at the time I'm writing the latestdata.csv has been updated 16 hours ago so their gap sense). Although I don't know if their website is augmented by another data source.
Also, the latestdata.csv has a total of 117K rows, whereas reported cases in the world as per healthmap.org is 500K+.
Is something wrong with the way I'm looking at the data or could it be incomplete?