CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.16k stars 18.46k forks source link

Changes in U.S. Reporting #382

Open CSSEGISandData opened 4 years ago

CSSEGISandData commented 4 years ago

In light of the increasing rate of cases being reported domestically in the U.S., and in order to retain timeliness and accuracy, we have switched from reporting at the county level to state level.

mw32 commented 4 years ago

Can't this be made dynamic? Zoom level would be one way to do this as discussed in https://github.com/CSSEGISandData/COVID-19/issues/363?

MuffleKerfuffle commented 4 years ago

Granularity might trump accuracy IMHO. If the bulk pings on this site are gen pop related I doubt they are just wondering about their state. For instance, TX is larger than some countries so total cases in TX isn’t real helpful when assessing risk levels. Maybe the prep changes for gen pop will change slightly of immediate local cases are filed. For instance, I might not travel to Austin if I know there are 30 cases in the vicinity. The detail also gives ammunition for reasoning as it related to events and business meetings.

CSSEGISandData commented 4 years ago

Please stand by as the team is discussing upon this topic and the steps moving forward.

JHEASTON commented 4 years ago

County, City, and any level of granularity, is very important for this issue. Perhaps a category of "pending" while it's being determined, but then once determined, then list into appropriate county/city.

sethdeckard commented 4 years ago

Thank you for reconsidering this, the granularity would be much appreciated for those of us in larger states.

BrandonRCopeland commented 4 years ago

Very much agreed that granularity is critical here for the US. As stated before, states like CA and TX are larger than many countries, and knowing the granular cases by county is of the utmost importance. It's worth a slight delay so that we could have the granular county data as well for the US.

jwa5426 commented 4 years ago

I’m here to voice my support for what others have said. Displaying cases by state is not helpful to the vast majority of the public who aren’t flying or otherwise traveling across state borders. By allowing finer granularity, members of the public can make informed decisions regarding their risk level during their day-to-day life - going to work, shopping, visiting friends, etc.

Korywon commented 4 years ago

I agree. Harris County alone has a massive population compared to the rest of Texas. As an example, it would take you literal days to drive across Texas. A lot of people that I know in Harris County rely on that specific information to make decisions. I personally don't see how that contributes to accuracy as it just becomes part of a bigger number. It makes it really difficult to discern where the cases actually lie.

myanaros commented 4 years ago

Dallas County Texas here, was disappointed to see the granularity disappear hours after seeing the first case in my area pop up.

Thank you to this whole team for the awesome work being done to help keep people in the know.

pnisita commented 4 years ago

There are some states that have counties that are larger than other entire states and not having that local data makes the data virtually useless. For instance, I live in long Island and if our results are grouped with Albany or Buffalo, it does us no good at all.

xtronaltic commented 4 years ago

For the people of America, please change it back.

rdfedor commented 4 years ago

This change will hurt many efforts by individuals to stay informed as to areas they should avoid and instead of having the granularity to see what counties have the reports, show what states to avoid even though many parts of states or countries are unaffected by this. I have a daughter with Cystic Fibrosis who will be visiting me this summer and now because of the loss of granularity, it makes the case that even though I live in Texas with a few number of cases thus far, that the entire state is a hazard that should be avoided rather than giving me the information I need to avoid the areas that have active outbreaks.

As others have stated, there's ways to aggregate the data on particular zoom levels as to avoid the performance degradation of showing which can then show the county levels once you pass a particular zoom. This would be a more ideal implementation as to resolve the performance impacts without loosing visibility into the useful information that many people go to this map to look for.

aleksandar-jovicic commented 4 years ago

Does finer granularity increase efforts spent to collect data or it is just because visual representation. Or let rephrase question. Are you still collecting data based on county level but just decide to aggregate it for presentational purposes or your source just provide data on state level?

Is it possible to provide another feed containing data on fine granular level and do dynamic aggregation for presentation level in existing API call?

CorwinOA commented 4 years ago

Wanted you all to know how much we all value and respect your work, especially given how this started and grew and the manual nature of some of your work.

I’ll admit that state-by-state data doesn’t have the same impact on me, both as someone with a public safety history and as an observer. I hope you’re able to bring back county level reporting or find a way to crowd source some of the labor that was causing you to shift towards state level reporting.

If it’s only going to be state from here on out, I suggest you change the map to state shading and color coding to represent volume. As an example, your change now makes it appear that my county accounts for my entire state’s caseload. Needless to say that was a shock when I flipped it open this morning.

Seriously, you all are awesome for doing this and sharing it!

NathanBick commented 4 years ago

Can we please include the county data available in the repo even if the main dashboard does not reflect county data anymore?

SchlittDataSci commented 4 years ago

Chiming in here, this has been an amazing resource, but in the long run people are going to need to treat this like the weather for granular, local, risk based decision making. For example, "if I'm in a high risk group, are there sufficient cases in such county that I should reconsider activities there"

The state-wide data is going to be a bit less useful to modelers and analysts, but vastly less useful to regular individuals trying to plan their days safely in the coming months as this saturates the map.

MickMickle commented 4 years ago

I fully agree with the need to display cases down to at least county level in the U.S. I was very disappointed yesterday when that information was consolidated into single dots in each state, seemingly randomly placed within the states, even though the size of the dots still did reflect the magnitude of cases. I was hoping that the loss of granularity was just a symptom of the site being overloaded. So I was instantly motivated to find another tracking site. Let's face it: yours is the best, but this state-level only display is a significant degradation.

Collecting, entering, and displaying all of the case location and status data must be an enormous and laborious undertaking -- I can't even fathom it, and I can't fully express how appreciative I am -- Thank you! But if you are actually still entering the county location, please continue to have the map display those locations by county. The people of America do need that information for all the reasons already given by others.

If you are concerned that the map display becomes too complex for the viewer if it has too many dots and circles, don't worry about that. Users of your dashboard will figure it out -- it's very intuitive. If it's just too labor intensive, tell us what we can do to help. Donations? Publicity?

jawz101 commented 4 years ago

All I know is I would like to at least have the wiki open to list any Official data sources for any geographically formatted data feeds people find for anywhere in the world. (i.e.- not copies of copies or user-maintained feeds.)

jocooper43016 commented 4 years ago

Please stand by as the team is discussing upon this topic and the steps moving forward.

Please please please bring back at least county granularity

JeremyIglehart commented 4 years ago

Pennsylvania here - I appreciate all of the hard work you are doing and understand the decision you have made. I must admit, however, that - I will no longer be using this to check what is important to me. I am willing to donate money to your cause if it meant being able to go back to the county level.

Although, Pennsylvania is not as big as Texas - it's still quite a large territory - taking several hours to drive from east to west or vise versa. It's much more important to me to know what is going on more directly around me. Much of my family lives in Pennsylvania - but everyone in different counties. I now have no idea how to check on how things are going in a level of detail that actually helps me. As a result, I will most likely stop using this tool :(

My county has a detailed map also using Esri - you may be able to use links like this to help achieve state-level or even more granular than county level for the counties that have gone to the effort.

In summary, please consider:

jheasley322 commented 4 years ago

Particularly in light of the geographic distance between cases (southern and northern california being hundreds of miles apart as one example), the ability to cluster based on county data is crucial. i am happy to provide some etl recources if needed. If not to the main file, perhaps a supplemental consolidated file that can give us the info we need. thanks!

tghamm commented 4 years ago

We’re interested in helping at-risk populations (seniors) and the county level data is magnitudes more useful. We’d be happy to contribute data, etl work, and/or money to help benefit the public to keep the effort going.

chiester commented 4 years ago

Any consideration for backfilling the state data with the known number of cases for previous days? Most states did not really have zero confirmed cases before today but that's what the new state rows show.

philipncohen commented 4 years ago

+1 on willing to donate or crowdsource to get counties back.

mbeck94 commented 4 years ago

I echo other responses here, the work being done is tremendous, however, county level data is far more useful than state level

dawenx commented 4 years ago

Just FYI, our dashboard shows county-level statistics for USA (click on a State to view), and state/province-level statistics for Canada, Germany and Chile. See #7 for link.

peterdrier commented 4 years ago

If there were 3 columns (City/County, State, Country) then the data would work a bit better..

Then if the state #'s from where ever they're sourced could be decremented by the city values.. I.e. the NY State value wouldn't include NYC, Nassau, ... Leaving the sum for NY correct instead of the double counting now. And the states having no cities broken out could be reported as is. Could be something like:

Phoenix, AZ, US - ### Flagstaff, AZ, US - ### Other, AZ, US - ### (or this could be blank for city, as this is the pattern otherwise in the file)

This way the sum of AZ, US would be correct, yay pivot tables, and everyone gets to keep their granularity/city based time series history.

ghost commented 4 years ago

Are the cruise ship cases being double counted?

aatishb commented 4 years ago

Hi. Thank you for all your hard work on this, this is really an invaluable resource. As a few others have pointed out, since the most recent data update reports US data at both state and county level, if you sum all entries with 'Country/Region' == 'US', you get a sum that is nearly double the total number of cases. This is confusing as only the US is reported in this way. Ideally summing all entries for a country should give the correct total without double counting.

big4-data-person commented 4 years ago

First, thank you for all the work y'all are putting into making this data available to us. I would respectfully ask you to disaggregate the data and present it at the most granular level you can again, even going back to county-level reporting would be enormously helpful. At the state level, there is really no GIS-powered analytic info we can glean from this. Thank you for re-considering rolling this up to the state level.

JHEASTON commented 4 years ago

@CSSEGISandData do you have an update available? Are you still considering the possibility of returning to more granular reporting? P.S. We are so appreciative of your work.

travisp commented 4 years ago

A lot of people wish this would remain county level, but it simply may be too much work (and the states may stop providing sufficient information).

Assuming this must go to state level reporting, I think that's better than nothing. However, it's difficult to deal with the data as it currently presented because it looks like, for example, Washington got 262 new cases in one day, and it makes it look like there are more cases in the US and each state than there are. In my opinion, if the repository is going to stop reporting county level data it should: 1) archive the old county level data, and 2) aggregate all of the old county level data into the state level data.

cwacht commented 4 years ago

I'm attempting to understand the work required to include city/county level granularity. After looking through the data sources in the readme, I could not find anything useful. Can someone help me to determine where the data has been coming from?

mw32 commented 4 years ago

@CSSEGISandData - Understand you guys must be very busy and slammed with questions/requests. Do you have an update on the discussion you guys are having, and any ideas what's next? Also, the data stored here seems to be "after the fact", am I correct?

DerSticher commented 4 years ago

First off, thanks for all the work that you're doing! Also providing the data here is an invaluable source for developers like me.

I ended up putting up a small github page to visualize the cases per country over time. With your aforementioned changes, the csv files still contain the data per city, which leads to a higher amount of cases, when summing up all the rows from the US. Due to the sheer amount of work needed to maintain numbers per city during a large outbreak, I can understand your decision to only report the numbers of the states. Nonetheless, it would be really helpful to continue with only one of those approaches, and not keeping the old data in there. Otherwise it is not really possible to sum up the total US values automatically.

JHEASTON commented 4 years ago

I'm starting to wonder if this site is being actively updated. There were only 284 US cases added over the last 24 hours. That seems incredibly low.

peterdrier commented 4 years ago

I'm starting to wonder if this site is being actively updated. There were only 284 US cases added over the last 24 hours. That seems incredibly low.

Or maybe the US is rationing their testing and is under detecting cases...

chiester commented 4 years ago

https://twitter.com/KagroX/status/1237926287768981504?s=20

matthewrj commented 4 years ago

Where does the county level data come from? I can't find it in any of the listed data sources.

rajrao commented 4 years ago

Please bring back county level info for the US. State level just is no granular enough.

cwacht commented 4 years ago

It seems like the county/city data has been coming from the individual state websites.

Click the states on the map on the CDC website https://www.cdc.gov/coronavirus/2019-ncov/cases-in-us.html#reporting-cases

Then navigate the state website to find where they are reporting county/city level data.

For example New York: https://health.ny.gov/diseases/communicable/coronavirus/ Washington: https://www.doh.wa.gov/Emergencies/Coronavirus

California seems to require you to get the data from the individual county/city websites: San Francisco: https://www.sfdph.org/dph/alerts/coronavirus.asp Alameda: http://www.acphd.org/2019-ncov.aspx

This could explain why collecting county level data on a daily basis is no longer viable. Maybe we can split up the work?

piccolbo commented 4 years ago

The way the county level was included led to incorrect summaries. It has to be crystal clear what entries are sums of other entries, if any. Like if you want to have state level and county level in the same file, you have to have a field county and a field state, or a field name and a field entity type. You can't expect people apply regex trickery, look for commas or what not and get it right all the time.

wldflwr commented 4 years ago

Thank you for all your hard work, and dedication towards keeping us informed! I am troubled by the most recent change... I’d like to see USA grand total return to the left window pane; selecting the country in the window pane was user friendlier when keeping track of overall confirmed, deaths, recovered and active cases in the USA. I did also appreciate the county-wide mapping but understandably as the virus spreads the map would become challenging to select button for county info. I think single dot on each state is fine, but when red dot is selected perhaps an additional window could appear in right window pane detailing individual counties. At this time no data is appears in the left window pane anymore-hoping you return USA total to the left window as previously. Thank you again for all that you are doing.

becare-rocket commented 4 years ago

My hack is to use the larger of the state based data or the county/city based data so as to have a long yet continuous time series.

DavidGeeraerts commented 4 years ago

If the State websites (i.e. Washington State) would just use TABLE HTML tags, then it would be very easy to use HTML tools to automate getting County level data. I've put in a request for WDOH for them to fix their website, but I've not heard back. Before they changed the website layout, they were using TABLE tags, made it real easy to automate getting the data.

mpfriesen commented 4 years ago

If the State websites (i.e. Washington State) would just use TABLE HTML tags, then it would be very easy to use HTML tools to automate getting County level data. I've put in a request for WDOH for them to fix their website, but I've not heard back. Before they changed the website layout, they were using TABLE tags, made it real easy to automate getting the data.

Yes, I was trying to figure out yesterday why my Python scraper stopped working there. Thanks for bugging them.

longsyntax commented 4 years ago

@DavidGeeraerts @mpfriesen Are you guys already working on scraping county-level data? We're trying to consolidate efforts for a community-maintained repo that has county level data over at #558

DavidGeeraerts commented 4 years ago

@longsyntax The Washington Counties are centrally reporting to Washington Department of Health, so all the County data is available from WDoH. I'm maintaining my own dashboard COVID-19-Dashboard. If there's a distributed effort to maintain County level data for the States, I'm game for doing it for Washington State.

mpfriesen commented 4 years ago

@longsyntax I'm maintaining a page here for The Oregonian: https://projects.oregonlive.com/coronavirus/. I can do county-level for Oregon.

MickMickle commented 4 years ago

The FAQ dated March 13 for the map explains that they plan to go back to county level sometime:

Why does the map report only state-level data in the United States instead of county-level data? In light of the increasing rate of cases being reported in the United States and worldwide, and in order to retain timeliness and accuracy, the map switched from reporting at the county level to the state level on March 10. The team expects to return to county-level reporting once it feels confident the platform can provide the most accurate, timely reports from local jurisdictions as the virus rapidly advances.

Why is a point on the map located on my city or neighborhood? All points shown on the map are based on geographic centroids, and are not representative of a specific address, building or any location at a spatial scale finer than a city. Click on each point on the map to obtain information associated with each reported case. When the map is reporting state-specific data, the points are located in the center of each state. When the map is reporting county-specific data, the points are placed precisely at the geographic center for those jurisdictions.