Closed aatishb closed 3 years ago
I am just waiting for them to fix it, and my US visualizations are wrong right now.
I find that I am constantly fixing and updating my API because this dataset is so unpredictable.
My dashboard has been getting more traction than I expected. What started as a fun project for me is now slowly starting to becoming a full time job in order to maintain accuracy and functionality.
I implemented a little fix in order to fix the issue with the US states/cities basically by filtering out all the values containing a comma (since I believe cities will no longer be maintained)
I spent 2 hours hacking through a solution last night but I am not confident with the results. I am not excited about the prospect of hacking a solution for cleaning data on a daily basis. I wish they could fix it. I am hoping they simply fix it.
For USA data, I extract STATE_CD's from the data (province State) before 3/9. Ref table in Database STATE_CD , STATE_NM. Data 3/10 and beyond. Just use State Names. Join data via STATE_NM to get time series back with complete History.
@treerunner This is better than having only one or the other. This at least gives us options. I am mapping all of the data as it gives you a visual of outbreak locations and a the state total. Then only counting which one is greater State Cases or Counties in State Cases
@aatishb - I am currently using method #1 but I see that the timeseries is inconsistent. So I guess this needs to be fixed by the maintainers. Kudos to them anyway for making this available!
Looks like #590 addresses the double counting issue.
Utilizing (OSINT) Open-source intelligence, techniques for Covid-19 research. An all hands on deck guide.
when doing OSINT we focus on Targets such as people or businesses but we can also use these same techniques for data collection on the virus
Part 1. intro to Osint https://www.reddit.com/r/OSINT/comments/e78he1/osint_for_beginners_part_1_introduction/
part 2. Tooling https://www.reddit.com/r/OSINT/comments/e7a4ke/part_2_tooling/
part 3. case/methods https://www.reddit.com/r/OSINT/comments/e9276y/osint_guide_part_3_case_management_and_methodology/
TASKS IN PREPARATION FOR THE COVID19 STUDY-A-THON
https://docs.google.com/document/d/1wD4qMy3jyNPXBOCEivkOqnXMteuWbF5yKenaZR6g57s/edit#
If you’re new to Coronavirus research, start here…
SUMMARY OF SARS-CoV/SARS-CoV-2 AND COVID-19 FINDINGS
my Github full of covid-19 data https://github.com/star-ops?tab=repositories
my https://www.mendeley.com/profiles/flynn-carsen/ with DOI research links might have to make a acc
On Thu, Mar 12, 2020 at 7:34 PM Aatish Bhatia notifications@github.com wrote:
Looks like #590 https://github.com/CSSEGISandData/COVID-19/issues/590 addresses the double counting issue.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CSSEGISandData/COVID-19/issues/571#issuecomment-598477616, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANRV7GII5NO6BYUNAHQK2BDRHFWR5ANCNFSM4LGP6CLQ .
I use some simple shell commands:
cat time_series_19-covid-Confirmed.csv | grep 'US' | grep -v ", "
For now it works.
If someone is looking for a way to handle this in Golang (including the new county omission), here's how I'm doing it: https://gist.github.com/kamermans/397488317c75b23414100d7e1316e96f
It's not just double-counting issues. In at least some cases, the state count doesn't include individual county counts. For example, on 1/24, Cook County, IL
has 1 confirmed case but Illinois
has 0.
As of March 10, the data contains US cases at both state level AND county level. This is leading to double counting problems where if you sum all the US cases, you get a number that is roughly twice as high as the true number of cases.
See also #382, #472, #496, #559, #501, #541 and many more
The way I see it, for the US cases, we can either:
1. Focus on state data and ignore county based data
2. Focus on county data and ignore state data
3. Do something else
How are people dealing with this? Are the state and county levels providing the same total numbers, or is one source more reliable than the other? I'm curious if anyone has a workaround for this.