Open ciscorucinski opened 4 years ago
@tmeacham timeseries data, merged with JHU's data (after normalizing summing cities within counties, normalizing county names, and normalizing state names) is now available in JSON, CSV, and Tidy CSV: http://blog.lazd.net/coronadatascraper/
Current stats:
@lazd it's great! i'd probably let people "tidy" (long format) their own data from the wide format you provide; the size difference between the files is huge (almost 10x).
(it's nice enough to put me in a quandry whether to stick with JHU ["i'd rather fight than switch!"], or, switch. and, it's hard to decide how to decide. good luck with it!)
Thanks @greg-minshall! We're up to 141 countries, 131 states, and 673 counties, but still need contributions to many things. We've got 7 contributors to the scraper so far, wanna make it 8?
We've been meeting with a few others folks that are working on similar efforts, and we'll be putting phone calls out counties that aren't presenting their data in a machine-readable form. Full steam ahead!
Hi All, I still have to verify the lat/long for Washington state but the rest for today's file looks good for that state. Is this the final file format? I ask since it is different then the current one and will break any system that was using the old one. Does the jhu map app ingest this file? FYI I am a Jhu student in the Computer Science program working on my Masters but work full time as an Enterprise Solution Architect so doing what I can. Thanks!
@adanecito are you referring to coronadatascraper's files? Our lat long is calculated as the dead center of the GeoJSON polygon associated with the location. How far off is it from what you had before?
I don't think the format is totally set in stone, but I believe you can rely on the timeseries.csv
OR the combination of timeseries.json
, locations.json
, and features.json
. See how to use timeseries data from JSON here and here.
timeseries-byLocation.json
will likely change, it's keyed off a string generated from the city, county, state, country. We are working with @hyperknot to key this off an ISO standard instead, and this may also affect the keys in timeseries.json
, locations.json
, and features.json
(though they'll still reference one another properly).
Thanks for the prompt response. I am not using the time series one. The one I use for example it does not have the sum or source url. I can always change the file parsing code to accommodate for those differences. Usually the file has the name month-day-year or mm-dd-yyyy. So I can adapt quickly but not sure of others. Also the more data you add the bigger the file will get or the bigger the Map will get thus more load and higher response time and as more people hit that map. Many Thanks, -Tony
@lazd sorry, the scraping is probably not where i can be of help.
Very true @adanecito. Please report any issues with the data or output format, missing columns, etc to https://github.com/lazd/coronadatascraper/issues
Thanks Larry. To report issues I need to know the requirements. For example what should the fields be for what is being attempted? I know for the data what should be there by looking at the source but are extra fields expected to be there? Is the requirement to match the daily reports?
@tomquisel the report should not show counties where there was no reported people infected that is what you might be seeing. But as Larry said file an issue hopefully with proof for verification/validation. That could be something you right now or when you are off work. If they have people cured then they should report it even if they currently do not have anyone affected. Same goes with deaths and no infected. At least that is the way I see it.
Ok I got yesterdays data to work. Here is what I am up to.
I did learn some thngs about the data mostly what I said earlier. Go JHU!!!
Hello! I would like to contribute to this effort. I can assist with writing scrapers if needed. I'm a little unsure on where to start here and how much has already been done. That said I have been following the situation in my own state (Kentucky) quite closely and would be happy to seek out information and potentially contribute a scraper if needed. Otherwise I am happy to take on tasks elsewhere. Just need a bit of guidance to get started. Thank you!
Hey @DatJord please take a look at our lists of sources: https://blog.lazd.net/coronadatascraper/#sources
Join our Slack and coordinate with us, and check out the contributing section for information on getting started with a scraper, and a link to a doc of websites that need to be scraped: https://github.com/lazd/coronadatascraper/#contributing
@longsyntax Those are great links to state-level statistics. I do want to point out that https://www.health.ny.gov/diseases/communicable/coronavirus/ lumped all the data from Richmond, Manhattan, Kings, Queens, and Bronx counties into just "New York City".
I would love to contribute as well! I have started my own project https://github.com/JKSenthil/covid19-spread-tracker to visualize COVID-19 cases county by county. I scrape this information from https://coronavirus.1point3acres.com/en, and used waybackmachine to get historical data.
I actually developed an API for county data: https://github.com/JKSenthil/coronavirus-county-api. Feel free to use!
Am I right in surmising that the CDC does not get individual death reports from county coroners as they occur? Could the CDC possibly be that unprepared for a pandemic? Or do they have the info and refuse to share it?
Believe it or not, in Switzerland every single positive Covid19 case is being sent by Fax to the central government. They can't keep up with the Faxes coming in!
I would have thought that if the CDC had one job at all, it would be to have a system for pandemic reporting where county coroners could just log on and report the death.
I think this is close to what you're looking for with death reports. It includes a flu-related deaths column.
Those are aggregate numbers, and don't have anything with respect to coronavirus. I think if what I wanted was out there, this entire issue would not have been raised, because it precisely about the difficulties of collecting the data I want.
Does anyone have Iowa or Minnesota data?
Also on the hunt for Kentucky data. Thought I found an option but it ended up being a dead end.
@croixchristenson I emailed Iowa and they are looking into it:
Good morning,
Thank you for this information and for your recommendation. I have forwarded it to our leadership working within the State Emergency Operations Center for their review and consideration.
Sincerely,
Kelsey Feller
Question, for places that automated collection is challenging, is there a place or way to submit data for states at the county level manually? I've been doing this for MN and IA the past few days.
Thanks David! If you need historical IA data I can share it too.
Anyone else see the NY Times is doing county data now? https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
@croixchristenson I emailed Iowa and they are looking into it:
Good morning, Thank you for this information and for your recommendation. I have forwarded it to our leadership working within the State Emergency Operations Center for their review and consideration. Sincerely, Kelsey Feller
I would also be extremely happy to contribute. I can offer my help and skills with coding and system design
I'm curious, where is the county data coming from right now in the visualization here? https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6
Is there a way to access this data? Is it coming from @JKSenthil 's API? Thank you.
I have been experimenting using the snapshot data. I am not sure when it gets updated, I am thinking sometime each day? I would think using an API is good but more dependencies and liability.
I'm curious, where is the county data coming from right now in the visualization here? https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6
Is there a way to access this data? Is it coming from @JKSenthil 's API? Thank you.
@curran Skimming through their sources, it seems they are scraping the county data from https://coronavirus.1point3acres.com/en. You can use the https://github.com/ExpDev07/coronavirus-tracker-api API, which retrieves county data from CSBS, where they use their own data alongside 1point3acres's data to provide county data. (ie https://coronavirus-tracker-api.herokuapp.com/v2/locations?source=csbs)
Is there any county data that includes a timeline (historical data)?
📌 Ongoing Information 📌
Website: Corona Data Scraper Download data and view sources
GitHub: Corona Data Scraper Help write scrapping rules. See Readme
Google Doc: COVID-19 Community Data Collection Public + comment access: Comment information and sources Help us acquire valid, official data sources on all levels: County, State, Country
Slack: COVID Atlas First Join, then go to the COVID Atlas Slack
Background
It is clear that the team at @CSSEGISandData cannot accommodate and scale with the huge influx of new cases within the US. Therefore, it is perfectly reasonable that they abandoned the county-level reporting of cases. I think when people look at the decision with unbiased and an open mind, they will see that this was the right balance to be as helpful as possible. With that said, it is sad to see the county-level information be abandoned completely. It is very helpful!
I remember seeing a +3 increase in Wisconsin and was wondering exactly where those cases were located, and I had to search for and read a few articles to verify. But this chart could have provided that detail to me very fast!
But again, the current processes cannot scale to the number of new cases. So we have to change the processes if we want to bring this back, and the sooner the better.
Suggestion
So, I suggest some new ability to let the community, who deeply care about this information, to help @CSSEGISandData get as accurate of information as possible. You know what type of information is needed to be registered for each new case, and that baton can be passed on to us to find, report, and verify (with verification probably being the biggest aspect of this effort).
I have seen a lot of people report new data as
Issues
and this new tool would be the preferred method to report those cases. Maybe they would have to provide an article link. A number of people could verify that information along with location information. This verification could go through multiple steps if needed, but @CSSEGISandData would have the final say in including the data after their own review and verification process.Ideas
The following are some ideas of how the processes could work.
Other Benefits
As an extremely positive benefit of this approach is that other countries could start providing their own more-localized data, and @CSSEGISandData could entrust a "country-representative" (CDC, or a respected university in said country) to do the review and verification of those country's more localized data.