ciscorucinski commented 4 years ago

📌 Ongoing Information 📌

Website: Corona Data Scraper Download data and view sources

GitHub: Corona Data Scraper Help write scrapping rules. See Readme

Google Doc: COVID-19 Community Data Collection Public + comment access: Comment information and sources Help us acquire valid, official data sources on all levels: County, State, Country

Slack: COVID Atlas First Join, then go to the COVID Atlas Slack

#mapping : Discussion of data mapping
#scraper-dev : Scraper development discussion
#scraper-issues : Discuss issues with the scraper’s output data
#us-state-county-data
#usa-dashboard : Discussion for how to best use data for metrics and visualizations on http://covid19tracker.us
#documentation : Discussion of information dealing with the Google Doc

Background

It is clear that the team at @CSSEGISandData cannot accommodate and scale with the huge influx of new cases within the US. Therefore, it is perfectly reasonable that they abandoned the county-level reporting of cases. I think when people look at the decision with unbiased and an open mind, they will see that this was the right balance to be as helpful as possible. With that said, it is sad to see the county-level information be abandoned completely. It is very helpful!

I remember seeing a +3 increase in Wisconsin and was wondering exactly where those cases were located, and I had to search for and read a few articles to verify. But this chart could have provided that detail to me very fast!

But again, the current processes cannot scale to the number of new cases. So we have to change the processes if we want to bring this back, and the sooner the better.

Suggestion

So, I suggest some new ability to let the community, who deeply care about this information, to help @CSSEGISandData get as accurate of information as possible. You know what type of information is needed to be registered for each new case, and that baton can be passed on to us to find, report, and verify (with verification probably being the biggest aspect of this effort).

I have seen a lot of people report new data as Issues and this new tool would be the preferred method to report those cases. Maybe they would have to provide an article link. A number of people could verify that information along with location information. This verification could go through multiple steps if needed, but @CSSEGISandData would have the final say in including the data after their own review and verification process.

Ideas

The following are some ideas of how the processes could work.

Stack Overflow's Triage Queue: Quickly move information to other needed areas
Stack Overflow's Review Queue: Review and verification of information
Allow people to point out incorrect data and add it to the queues
Multiple people should review and verify each datapoint.
etc...

Other Benefits

As an extremely positive benefit of this approach is that other countries could start providing their own more-localized data, and @CSSEGISandData could entrust a "country-representative" (CDC, or a respected university in said country) to do the review and verification of those country's more localized data.

lazd commented 4 years ago

@tmeacham timeseries data, merged with JHU's data (after normalizing summing cities within counties, normalizing county names, and normalizing state names) is now available in JSON, CSV, and Tidy CSV: http://blog.lazd.net/coronadatascraper/

Current stats:

117 countries
135 states
473 counties
725 total regions
616 GeoJSON features
Population data for 630 regions

greg-minshall commented 4 years ago

@lazd it's great! i'd probably let people "tidy" (long format) their own data from the wide format you provide; the size difference between the files is huge (almost 10x).

(it's nice enough to put me in a quandry whether to stick with JHU ["i'd rather fight than switch!"], or, switch. and, it's hard to decide how to decide. good luck with it!)

lazd commented 4 years ago

Thanks @greg-minshall! We're up to 141 countries, 131 states, and 673 counties, but still need contributions to many things. We've got 7 contributors to the scraper so far, wanna make it 8?

We've been meeting with a few others folks that are working on similar efforts, and we'll be putting phone calls out counties that aren't presenting their data in a machine-readable form. Full steam ahead!

adanecito commented 4 years ago

Hi All, I still have to verify the lat/long for Washington state but the rest for today's file looks good for that state. Is this the final file format? I ask since it is different then the current one and will break any system that was using the old one. Does the jhu map app ingest this file? FYI I am a Jhu student in the Computer Science program working on my Masters but work full time as an Enterprise Solution Architect so doing what I can. Thanks!

lazd commented 4 years ago

@adanecito are you referring to coronadatascraper's files? Our lat long is calculated as the dead center of the GeoJSON polygon associated with the location. How far off is it from what you had before?

I don't think the format is totally set in stone, but I believe you can rely on the timeseries.csv OR the combination of timeseries.json, locations.json, and features.json. See how to use timeseries data from JSON here and here.

timeseries-byLocation.json will likely change, it's keyed off a string generated from the city, county, state, country. We are working with @hyperknot to key this off an ISO standard instead, and this may also affect the keys in timeseries.json, locations.json, and features.json(though they'll still reference one another properly).

adanecito commented 4 years ago

Thanks for the prompt response. I am not using the time series one. The one I use for example it does not have the sum or source url. I can always change the file parsing code to accommodate for those differences. Usually the file has the name month-day-year or mm-dd-yyyy. So I can adapt quickly but not sure of others. Also the more data you add the bigger the file will get or the bigger the Map will get thus more load and higher response time and as more people hit that map. Many Thanks, -Tony

greg-minshall commented 4 years ago

@lazd sorry, the scraping is probably not where i can be of help.

lazd commented 4 years ago

Very true @adanecito. Please report any issues with the data or output format, missing columns, etc to https://github.com/lazd/coronadatascraper/issues

adanecito commented 4 years ago

Thanks Larry. To report issues I need to know the requirements. For example what should the fields be for what is being attempted? I know for the data what should be there by looking at the source but are extra fields expected to be there? Is the requirement to match the daily reports?

tomquisel commented 4 years ago

I love this project and plan to contribute! I notice that US county data is still fairly sparse. As a stop-gap, CSBS.org has been doing a great job of keeping a US county-level map up to date. You can find daily CSVs here.

adanecito commented 4 years ago

@tomquisel the report should not show counties where there was no reported people infected that is what you might be seeing. But as Larry said file an issue hopefully with proof for verification/validation. That could be something you right now or when you are off work. If they have people cured then they should report it even if they currently do not have anyone affected. Same goes with deaths and no infected. At least that is the way I see it.

adanecito commented 4 years ago

Ok I got yesterdays data to work. Here is what I am up to. latestvirustest

I did learn some thngs about the data mostly what I said earlier. Go JHU!!!

Jord-Holt commented 4 years ago

Hello! I would like to contribute to this effort. I can assist with writing scrapers if needed. I'm a little unsure on where to start here and how much has already been done. That said I have been following the situation in my own state (Kentucky) quite closely and would be happy to seek out information and potentially contribute a scraper if needed. Otherwise I am happy to take on tasks elsewhere. Just need a bit of guidance to get started. Thank you!

lazd commented 4 years ago

Hey @DatJord please take a look at our lists of sources: https://blog.lazd.net/coronadatascraper/#sources

Join our Slack and coordinate with us, and check out the contributing section for information on getting started with a scraper, and a link to a doc of websites that need to be scraped: https://github.com/lazd/coronadatascraper/#contributing

codewarrior2000 commented 4 years ago

@longsyntax Those are great links to state-level statistics. I do want to point out that https://www.health.ny.gov/diseases/communicable/coronavirus/ lumped all the data from Richmond, Manhattan, Kings, Queens, and Bronx counties into just "New York City".

JKSenthil commented 4 years ago

I would love to contribute as well! I have started my own project https://github.com/JKSenthil/covid19-spread-tracker to visualize COVID-19 cases county by county. I scrape this information from https://coronavirus.1point3acres.com/en, and used waybackmachine to get historical data.

JKSenthil commented 4 years ago

I actually developed an API for county data: https://github.com/JKSenthil/coronavirus-county-api. Feel free to use!

PaulMansour commented 4 years ago

Am I right in surmising that the CDC does not get individual death reports from county coroners as they occur? Could the CDC possibly be that unprepared for a pandemic? Or do they have the info and refuse to share it?

zdavatz commented 4 years ago

Believe it or not, in Switzerland every single positive Covid19 case is being sent by Fax to the central government. They can't keep up with the Faxes coming in!

PaulMansour commented 4 years ago

I would have thought that if the CDC had one job at all, it would be to have a system for pandemic reporting where county coroners could just log on and report the death.

tomquisel commented 4 years ago

I think this is close to what you're looking for with death reports. It includes a flu-related deaths column.

PaulMansour commented 4 years ago

Those are aggregate numbers, and don't have anything with respect to coronavirus. I think if what I wanted was out there, this entire issue would not have been raised, because it precisely about the difficulties of collecting the data I want.

croixchristenson commented 4 years ago

Does anyone have Iowa or Minnesota data?

Jord-Holt commented 4 years ago

Also on the hunt for Kentucky data. Thought I found an option but it ended up being a dead end.

DavidGeeraerts commented 4 years ago

@croixchristenson I emailed Iowa and they are looking into it:

Good morning,

Thank you for this information and for your recommendation. I have forwarded it to our leadership working within the State Emergency Operations Center for their review and consideration.

Sincerely,
Kelsey Feller

croixchristenson commented 4 years ago

Question, for places that automated collection is challenging, is there a place or way to submit data for states at the county level manually? I've been doing this for MN and IA the past few days.

Thanks David! If you need historical IA data I can share it too.

Anyone else see the NY Times is doing county data now? https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html

@croixchristenson I emailed Iowa and they are looking into it:

Good morning,

Thank you for this information and for your recommendation. I have forwarded it to our leadership working within the State Emergency Operations Center for their review and consideration.

Sincerely,
Kelsey Feller

burnout87 commented 4 years ago

I would also be extremely happy to contribute. I can offer my help and skills with coding and system design

curran commented 4 years ago

I'm curious, where is the county data coming from right now in the visualization here? https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6

Is there a way to access this data? Is it coming from @JKSenthil 's API? Thank you.

adanecito commented 4 years ago

I have been experimenting using the snapshot data. I am not sure when it gets updated, I am thinking sometime each day? I would think using an API is good but more dependencies and liability.

newyork

JKSenthil commented 4 years ago

I'm curious, where is the county data coming from right now in the visualization here? https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6

Is there a way to access this data? Is it coming from @JKSenthil 's API? Thank you.

@curran Skimming through their sources, it seems they are scraping the county data from https://coronavirus.1point3acres.com/en. You can use the https://github.com/ExpDev07/coronavirus-tracker-api API, which retrieves county data from CSBS, where they use their own data alongside 1point3acres's data to provide county data. (ie https://coronavirus-tracker-api.herokuapp.com/v2/locations?source=csbs)

vbisbest commented 4 years ago

Is there any county data that includes a timeline (historical data)?

CSSEGISandData / COVID-19

County Data - Allow community creation, editing, and verification of this data for #558

📌 Ongoing Information 📌

Background

Suggestion

Ideas

Other Benefits