CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.1k stars 18.38k forks source link

graph with information day-by-day #314

Open aster94 opened 4 years ago

aster94 commented 4 years ago

Hello, I was curious about the day-by-day situation of covid in my country so i wrote a simple python script that using this repo make a multi-bar graph. For every day there are 3 bars: blue for new cases, green for recovered and red for deaths. Here the output of the program, you may find this interesting:

China_2020-03-09 Zhejiang_2020-03-09 Hubei_2020-03-09

France_2020-03-09 Iran_2020-03-09 Italy_2020-03-09 South Korea_2020-03-09

Australia_2020-03-09

Canada_2020-03-09 Toronto_2020-03-09

US_2020-03-09 King County_2020-03-09 Westchester County_2020-03-09

Also i made a dirty attempt to create a "score" to measure how well a country is responding to the COVID emergency (the highter the better): Italy_2020-03-09_score China_2020-03-09_score

Basically the score is: (daily recovered / all positives) / ((daily confirmed / (population of the country / 10000)) * (daily deaths / all positives))

code and updated graph moved to a repo: https://github.com/aster94/COVID-19

wingstonruballos commented 4 years ago

I can't manage programs, but I really appreciate those who make graphics to best understanding trends. ¿Could you do something like these ones, in my link, but not only 4000 people, but more realistic universe? watch this: https://www.washingtonpost.com/graphics/2020/health/coronavirus-how-epidemics-spread-and-end/

bigbenhur commented 4 years ago

Hi @aster94

I was looking for this and it's great! I can even just copy it into an online python compiler like https://repl.it/languages/python3 and it works out of the box.

Thanks a lot!

JiPiBi commented 4 years ago

Hi I got some issues to use your code . After installing countryinfo I got this error comment :

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 49: character maps to

I commented the line population = CountryInfo(country).population() and it worked

Another remark : I tried to run the code for 'US' (much lines) and it failed ,

ValueError: 8 columns passed, passed data had 9 columns

maybe you have to grouby your data ?

On my side , I use the other folder called "csse_covid_19_time_series" perhaps could give quicker result ?

Buona notte

aster94 commented 4 years ago

@wingstonruballos these are very nice graph/animations but "realistic" in this context is a very hard word, make a computer model of the spread of covid-19 is outside my ability, it would be like making a computer model of the global warming (not so easy)

@bigbenhur I am happy you found it useful, if you have any suggestion please write here 😃

@JiPiBi it is a problem with the python module, i have reported it back: https://github.com/porimol/countryinfo/issues/6 and i am using the solution they proposed

I tried to run the code for 'US' (much lines) and it failed

I think this could be due to the fact that Mainland China and US in the database are treated as composed by a group of states and not like a single state as the others

I use the other folder called "csse_covid_19_time_series" perhaps could give quicker result ?

in my opinion the result would be similar, i choosed to use the folder daily_reports because it suits better the logic of my script

JiPiBi commented 4 years ago

I come back on the US issue , I think now that it is linked in fact to the use of comma as separator for the first field and for the global csv line (got the same issue trying to import in access)

Example with US : ['"Travis', ' CA (From Diamond Princess)"', 'US', '2020-02-24T23:33:02', '0', '0', '0', '38.2721', '-121.9399'] ie 9 elements, 1 more than for other countries

Example with 'Italy' ['', 'Italy', '2020-03-08T18:03:04', '7375', '366', '622', '43.0000', '12.0000']] only 8

PS : I read your link about countryinfo and also applied the patch proposed for open and it also worked, thanks

davidacollins commented 4 years ago

@JiPiBi I think you're right about the comma in the 'Province/State' column. Also, 'US', 'Canada', 'Mainland China' are not unique entries in the 'Country/Region' column

JiPiBi commented 4 years ago

@davidacollins I got no issue using panda with "csse_covid_19_time_series" in that way

confirmed = "D:/Documents/GIT/COVID/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"
pdconfirmed = pd.read_csv(confirmed, sep=',')

and after I used groupby to sum up on second field

JiPiBi commented 4 years ago

Another remark about countryinfo : US is not recognized by wikipedia , you must use United States the same for Mainland China and China, so perhaps you must use a dictionary to translate the name of the countries used in csv files into readable countries for wikipedia

aster94 commented 4 years ago

I am without a computer right now but you should be able to make a few fix in the code

Try adding these lines: Fix for country info: add this instead of the population line if country == 'US' population = CountryInfo('United States).population() else if #do the same for China or other country with the same issue else: population = CountryInfo(country).population()

Fix for the comma: add this after data.append if country == 'US' data[0:2] = [''.join(data[0:2])]

qtangs commented 4 years ago

Below is the link to my code for processing the raw data into this format:

Location,Date,Total,Type,New
Afghanistan,2020-01-22,0.0,Infected,0.0
Algeria,2020-01-22,0.0,Infected,0.0
Andorra,2020-01-22,0.0,Infected,0.0

Type can be Infected, Recovered, Deaths or Current.

Feel free to try it out.

https://gist.github.com/qtangs/fd3948ac33a7c1ea6620d941269a04c8

qtangs commented 4 years ago

Also see this to get latest day by day graph for all infected countries: https://www.coronavirusstats.live/

JiPiBi commented 4 years ago

@aster94 In my opinion, it should be better to use a dictionary to deal with the issue on countryinfo names , because it would avoid multiple if, and it could be changed as needed.

For the same reason, for the comma issue , I think that you could be less specific than filtering some states, and use a regular expression or test the length after splitting , f.e. if equal to 9, join the 2 firsts

But it is your code and you remain the boss on it :-)

davidacollins commented 4 years ago

@aster94 CountryInfo fix worked well. Thanks

To fix the comma in the field I used:

data[-1:][0][0] = ''.join(data[-1:][0][0:2])
del(data[-1:][0][1])

which combined the split columns and removed the unnecessary column, but left me a NoneType at t_recovered = int(df.at[i, 'Recovered']) Any suggestions to fix would be appreciated.

JiPiBi commented 4 years ago

@davidacollins

As suggested above, you could test the length after comma splitting, and join the 2 firsts strings if equal to 9 ?

JiPiBi commented 4 years ago

@aster94 @davidacollins

I tried that :

    for row in rows:
        if country in row:
            r = row.split(',')
            if len(r) == 9 :
                r = [r[0][1:]+','+r[1][:-1]] +r[2:]            
            data.append(r)

and it works for US with a dictionary but not completely sufficient,

I think that in my code with pandas, as I keep the 'Province/State' column, I have not that issue and I can filter by 'Province/State' (I also copy the 'Country/Region' in the 'Province/State' when the cell is empty, ie for the majority of countries for the moment ) . I will try

JiPiBi commented 4 years ago

I made the test for Hubei in China and it gives that result for death evolution with my code States and countries can be mixed in the plots

index

r-lomba commented 4 years ago

@aster94 I like the idea of scoring how good a Country is doing, and I will try to integrate something like that into my project ASAP

If you want, you can check out my repo , maybe it could be inspiring in some way also regarding what you are doing. My project is basically made of:

Pls check out my repo here: https://github.com/r-lomba/covid-19-charts

And this is a link to the live webpage presenting all the meaningful aggregations for today: https://r-lomba.github.io/covid-19-charts/charts/

Pls note that, being tha core file a Jupyter Notebook, I also put a link on top of it allowing you to run the code live in Google Colab. Just click the icon on top of the Notebook and a Goolgle Colab window will pop up, ready to execute

aster94 commented 4 years ago

Good Day! I rewrote some part of the script and now it works far away better! You can see the new code and the updated graph in the first message Now you have the possibility to check a whole country like China or just a province like Hubei

Please check it and tell me if you run into any problem, I would try to solve them @davidacollins @JiPiBi yes the problem was caused by commas

it should be better to use a dictionary

@JiPiBi I did as you proposed, can you check my solution and see if you have a better idea?

you could be less specific than filtering some states

You were completely right, now i am filtering based only on columns number (if 7 or 9 it will join the first two)

@r-lomba I am checking your repo, very interesting! not very nice to see how it is proceding from the italian point of view 😣 If you came up to a score tell me, i am interested

JiPiBi commented 4 years ago

@aster94 For the comma issue , I didnt understood at first why you had the test for a length of 7 , but with some printings I understood that it was linked with some lines without Lat Long informations So the correction works , only remark , it let some ' '' like in

['"Fulton County GA"', 'US', '2020-03-07T16:53:03', '3', '0', '0', '33.8034', '-84.3963']

It was the reason I proposed this strange r = [r[0][1:]+','+r[1][:-1]] +r[2:] to get rid of unnecessary characters , but if you have no trouble with it

As in pandas I make sum even if there is only one line , I tried to simplify a bit your code :

        sub_total = {'posit':0,'recov':0,'death':0}

        rows = r.text.splitlines()
        for row in rows:
            if country in row:
                row_slice = row.split(',') 
                # Some country have a comma that creates troubble
                if len(row_slice) == 7  or len(row_slice) == 9:  
                    row_slice[0:2] = [''.join(row_slice[0:2])]
                sub_total['posit'] += int(row_slice[3])
                sub_total['recov'] += int(row_slice[4])
                sub_total['death'] += int(row_slice[5])

        try:            
            row_slice[3]= sub_total['posit']
            row_slice[4]= sub_total['recov']
            row_slice[5]= sub_total['death']
            data.append(row_slice)
        except:
            pass

It seems to work too

JiPiBi commented 4 years ago

As I am curious , I tried to check the result for the famous country : County and it works too, if the country is a substring of several states or provinces or regions , it would aggregate them, be cautious

index

aster94 commented 4 years ago

yes @JiPiBi you are right i need to write that down somewhere otherwise it could bring some people into mistake! by the way i just moved the script to a new repo, somehow i am going to make it automatically updated so it will creates these graph every day

JiPiBi commented 4 years ago

@aster94 Please , what do you mean by automatically updated : No human intervention at all ? I just read that in google , but not so clear for me https://support.glitch.com/t/tutorial-how-to-auto-update-your-project-with-github/8124

If possible, I would be interested to read your script to understand that possibility For the moment, I have to pull the data to my laptop and then open my jupyternotebook

PaFiK1999 commented 4 years ago

@aster94

All the countries that are split into provinces don't seem to have accurate dates in the graph.

aster94 commented 4 years ago

@JiPiBi have a look in my repo, i pubblished the python script to do it (it needs very little human intervention)

@PaFiK1999 i think that this is up to the maintainers of this repo

JiPiBi commented 4 years ago

@aster94 I'm back :-) I read your autopush.py code in which I suppose you fetch your data I suppose that, at the minimum, you have to run that file

But as I'm not so familiar with github possibilities, and even if it is a stupid question, I dare to ask : do you run the code from your desktop or laptop and commit the result in github or directly run in github ?

aster94 commented 4 years ago

yes i just run the file from my laptop and it creates the graph and push all the data to github making a commit

r-lomba commented 4 years ago

@aster94 in the end I have worked on the idea of "country scoring", and decided that I didn't want to calculate an arbitrary score using whatever formula. There are too many concurring variables here and the result would very likely be a flawed score, in my opinion

But I did something else: starting from the collected samples I have implemented polynomial fitting on the datapoints. This is, in practice, logistic regression that captures the most close polynomial modeling the data samples

From this polynomial, I extract and draw its second derivative. This allows to further calculate:

Inflection points especially are very interesting, and we know that they happen when the second derivative crosses zero. If it crosses zero heading upwards, the "original" polynomial trend is "increasing". If it crosses zero heading downwards, the trend is "decreasing"

I have seen that this correctly captures trends that are not visible to the naked eye, of course there is nothing guaranteeing that these small trends are stable, and they could vary the next day, but still they would be captured correctly even "tomorrow", and especially they are a fact, and not an arbitrary prediction

An example of such "advanced" charts that you can now easily produce using my code would be the following (here the trend is obvious and my approach is of course useless, but for many countries these days we are in a less obvious situation):

01_image

As you told me you would have been interested in my work on the scoring aspect of the problem, I wanted to send you an update :)

My repo is here:

https://github.com/r-lomba/covid-19-charts

aster94 commented 4 years ago

Thanks for sharing your approach @r-lomba it is indeed very interesting and the trends seems to reflect the real worl. Unfortunately since this repo seems to not following a constant practise over country names and update of the data making graph out if this is more difficult day after day