Closed jdrakephd closed 4 years ago
The US state cases file has number of daily deaths per state: https://github.com/CEIDatUGA/COVID-19-DATA/blob/master/UScases_by_state_wikipedia.csv
Looking at the US state case we have daily deaths for the whole country, not by state. @lsalvador am I missing something? @jdrakephd do you want a i)total for state fatalities, ii) state fatalities by date, or a iii) running total of state fatalities by date? I can get you (i) within 30 min, but the others will take longer.
@renikaul you are absolutely right, we only have the totals per day on that table. I got confused with the other table existent on the wiki page
I'm looking at your code right now for future daily scraping... I think I have it figured out so we can get fatality by state. Can you see if you can modify the world wiki code's wayback machine to get the fatality by state by date breakdown?
I was thinking fatalities by day by state. I don't need it for anything I'm doing right at the moment (cumulative fatalities are already in the table we have) but I do look at it frequently to ballpark things like underreporting rate or date of first cases... because I think deaths are better reported than anything else. I think it makes sense for us to start working toward a proper analysis of such data.
ok. I will dig into this more tomorrow afternoon.
@lsalvador I'm comparing the numbers in the table output by the script and what is displayed on the webpage. For some states, the total number in the table output and the "Cases" column of the wiki page don't match. I think these numbers should match? I also can't seem to figure out how the html source code is capturing fatality by state- there is a column name, but no data. Any ideas?
This is my first experience with html scraping so I'm a bit slow.
@lsalvador if you are going to build something for states off my world way back machine scraping code let me know as there are a few kinks that I had to work out that while commented may not make sense clearly to not me.
@renikaul I see what you mean, we had more cases in our stored table than in the wiki. As an example, yesterday at 21:21 there were 56 new cases in GA and today the number is 25. Could have been a typo. I have just updated the table and the numbers are matching. However, let me know if you find anything else that is not quite right.
Regarding the fatalities table by state, I can quickly extend the code to incorporate it and output it on a daily basis. The html scrapping involves a few steps that sometimes are not clear (it took a bit of trial and error), but once that it is done, it is quite straightforward. Let me know if you want go through the code together.
The other table on the wiki only has total deaths per state. The only information we can get daily from the wiki is the number of cases. However, the COVID Tracking Project has daily information on deaths but as Robbie mentioned only starts reporting on March 4th. Maybe combining this file with the US line list info collected by David and Paige we will be able to have a complete dataset
Example of COVID Tracking Project US daily info: date | state | positive | negative | pending | death | total | dateChecked |
---|---|---|---|---|---|---|---|
20200316 | AK | 1 | 143 | 144 | 2020-03-16T20:00:00Z | ||
20200316 | AL | 28 | 28 | 40 | 0 | 96 | 2020-03-16T20:00:00Z |
20200316 | AR | 22 | 132 | 14 | 168 | 2020-03-16T20:00:00Z | |
20200316 | AZ | 18 | 182 | 63 | 0 | 263 | 2020-03-16T20:00:00Z |
In case it is still useful, I have the code to extract the totals from the wiki page with the format below. Let me know if it this table ande code should be pushed to github
state | cases | recovered | deaths | remaining |
---|---|---|---|---|
Alabama | 39 | 0 | 0 | 39 |
Alaska | 3 | 0 | 0 | 3 |
American Samoa | 0 | 0 | 0 | 0 |
Arizona | 13 | 1 | 0 | 12 |
Liliana,
@lsalvador I think you are saying that wikipedia doesn't have deaths by state by day, but that's not correct. When I view the wikipedia page there is a table for fatalities just like the one for cases.
@jdrakephd The only table I see is the one with total information. Can you still see the daily one in your browser?
It disappeared when I refreshed my browser - but it had historical data back to the beginning. It has to be somewhere on the Internet, perhaps using wayback machine
Found it. Version from March 16 @ 13:21
This page has tracked all deaths
And it gets updated just needs a good scrape
Hope this helps. Ana
Sorry for being terse: Fat fingers ∪ small keys
On Mar 18, 2020, at 08:56, Liliana Salvador notifications@github.com wrote:
Found it. Version from March 16 @ 13:21
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
@jdrakephd, I added daily fatalities by state table to github data repo - still pulls data from the old version of the wiki. Will keep an eye for an updated version and README file will be updated shortly
@anabento, that information is very useful to have - thank you
@jdrakephd I think @lsalvador posted the data you were looking for. I will close the issue. Please re-open if needed.
Do we have a dataset for state fatalities? If not, can we scrape this from wikipedia too?