Open aster94 opened 4 years ago
I can't manage programs, but I really appreciate those who make graphics to best understanding trends. ¿Could you do something like these ones, in my link, but not only 4000 people, but more realistic universe? watch this: https://www.washingtonpost.com/graphics/2020/health/coronavirus-how-epidemics-spread-and-end/
Hi @aster94
I was looking for this and it's great! I can even just copy it into an online python compiler like https://repl.it/languages/python3 and it works out of the box.
Thanks a lot!
Hi I got some issues to use your code . After installing countryinfo I got this error comment :
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 49: character maps to
I commented the line
population = CountryInfo(country).population()
and it worked
Another remark : I tried to run the code for 'US' (much lines) and it failed ,
ValueError: 8 columns passed, passed data had 9 columns
maybe you have to grouby your data ?
On my side , I use the other folder called "csse_covid_19_time_series" perhaps could give quicker result ?
Buona notte
@wingstonruballos these are very nice graph/animations but "realistic" in this context is a very hard word, make a computer model of the spread of covid-19 is outside my ability, it would be like making a computer model of the global warming (not so easy)
@bigbenhur I am happy you found it useful, if you have any suggestion please write here 😃
@JiPiBi it is a problem with the python module, i have reported it back: https://github.com/porimol/countryinfo/issues/6 and i am using the solution they proposed
I tried to run the code for 'US' (much lines) and it failed
I think this could be due to the fact that Mainland China
and US
in the database are treated as composed by a group of states and not like a single state as the others
I use the other folder called "csse_covid_19_time_series" perhaps could give quicker result ?
in my opinion the result would be similar, i choosed to use the folder daily_reports
because it suits better the logic of my script
I come back on the US issue , I think now that it is linked in fact to the use of comma as separator for the first field and for the global csv line (got the same issue trying to import in access)
Example with US : ['"Travis', ' CA (From Diamond Princess)"', 'US', '2020-02-24T23:33:02', '0', '0', '0', '38.2721', '-121.9399'] ie 9 elements, 1 more than for other countries
Example with 'Italy' ['', 'Italy', '2020-03-08T18:03:04', '7375', '366', '622', '43.0000', '12.0000']] only 8
PS : I read your link about countryinfo and also applied the patch proposed for open and it also worked, thanks
@JiPiBi I think you're right about the comma in the 'Province/State' column. Also, 'US', 'Canada', 'Mainland China' are not unique entries in the 'Country/Region' column
@davidacollins I got no issue using panda with "csse_covid_19_time_series" in that way
confirmed = "D:/Documents/GIT/COVID/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"
pdconfirmed = pd.read_csv(confirmed, sep=',')
and after I used groupby to sum up on second field
Another remark about countryinfo : US is not recognized by wikipedia , you must use United States the same for Mainland China and China, so perhaps you must use a dictionary to translate the name of the countries used in csv files into readable countries for wikipedia
I am without a computer right now but you should be able to make a few fix in the code
Try adding these lines: Fix for country info: add this instead of the population line if country == 'US' population = CountryInfo('United States).population() else if #do the same for China or other country with the same issue else: population = CountryInfo(country).population()
Fix for the comma: add this after data.append if country == 'US' data[0:2] = [''.join(data[0:2])]
Below is the link to my code for processing the raw data into this format:
Location,Date,Total,Type,New
Afghanistan,2020-01-22,0.0,Infected,0.0
Algeria,2020-01-22,0.0,Infected,0.0
Andorra,2020-01-22,0.0,Infected,0.0
Type can be Infected, Recovered, Deaths or Current.
Feel free to try it out.
https://gist.github.com/qtangs/fd3948ac33a7c1ea6620d941269a04c8
Also see this to get latest day by day graph for all infected countries: https://www.coronavirusstats.live/
@aster94 In my opinion, it should be better to use a dictionary to deal with the issue on countryinfo names , because it would avoid multiple if, and it could be changed as needed.
For the same reason, for the comma issue , I think that you could be less specific than filtering some states, and use a regular expression or test the length after splitting , f.e. if equal to 9, join the 2 firsts
But it is your code and you remain the boss on it :-)
@aster94 CountryInfo fix worked well. Thanks
To fix the comma in the field I used:
data[-1:][0][0] = ''.join(data[-1:][0][0:2])
del(data[-1:][0][1])
which combined the split columns and removed the unnecessary column, but left me a NoneType
at t_recovered = int(df.at[i, 'Recovered'])
Any suggestions to fix would be appreciated.
@davidacollins
As suggested above, you could test the length after comma splitting, and join the 2 firsts strings if equal to 9 ?
@aster94 @davidacollins
I tried that :
for row in rows:
if country in row:
r = row.split(',')
if len(r) == 9 :
r = [r[0][1:]+','+r[1][:-1]] +r[2:]
data.append(r)
and it works for US with a dictionary but not completely sufficient,
I think that in my code with pandas, as I keep the 'Province/State' column, I have not that issue and I can filter by 'Province/State' (I also copy the 'Country/Region' in the 'Province/State' when the cell is empty, ie for the majority of countries for the moment ) . I will try
I made the test for Hubei in China and it gives that result for death evolution with my code States and countries can be mixed in the plots
@aster94 I like the idea of scoring how good a Country is doing, and I will try to integrate something like that into my project ASAP
If you want, you can check out my repo , maybe it could be inspiring in some way also regarding what you are doing. My project is basically made of:
Pls check out my repo here: https://github.com/r-lomba/covid-19-charts
And this is a link to the live webpage presenting all the meaningful aggregations for today: https://r-lomba.github.io/covid-19-charts/charts/
Pls note that, being tha core file a Jupyter Notebook, I also put a link on top of it allowing you to run the code live in Google Colab. Just click the icon on top of the Notebook and a Goolgle Colab window will pop up, ready to execute
Good Day! I rewrote some part of the script and now it works far away better! You can see the new code and the updated graph in the first message Now you have the possibility to check a whole country like China or just a province like Hubei
Please check it and tell me if you run into any problem, I would try to solve them @davidacollins @JiPiBi yes the problem was caused by commas
it should be better to use a dictionary
@JiPiBi I did as you proposed, can you check my solution and see if you have a better idea?
you could be less specific than filtering some states
You were completely right, now i am filtering based only on columns number (if 7 or 9 it will join the first two)
@r-lomba I am checking your repo, very interesting! not very nice to see how it is proceding from the italian point of view 😣 If you came up to a score tell me, i am interested
@aster94 For the comma issue , I didnt understood at first why you had the test for a length of 7 , but with some printings I understood that it was linked with some lines without Lat Long informations So the correction works , only remark , it let some ' '' like in
['"Fulton County GA"', 'US', '2020-03-07T16:53:03', '3', '0', '0', '33.8034', '-84.3963']
It was the reason I proposed this strange r = [r[0][1:]+','+r[1][:-1]] +r[2:] to get rid of unnecessary characters , but if you have no trouble with it
As in pandas I make sum even if there is only one line , I tried to simplify a bit your code :
sub_total = {'posit':0,'recov':0,'death':0}
rows = r.text.splitlines()
for row in rows:
if country in row:
row_slice = row.split(',')
# Some country have a comma that creates troubble
if len(row_slice) == 7 or len(row_slice) == 9:
row_slice[0:2] = [''.join(row_slice[0:2])]
sub_total['posit'] += int(row_slice[3])
sub_total['recov'] += int(row_slice[4])
sub_total['death'] += int(row_slice[5])
try:
row_slice[3]= sub_total['posit']
row_slice[4]= sub_total['recov']
row_slice[5]= sub_total['death']
data.append(row_slice)
except:
pass
It seems to work too
As I am curious , I tried to check the result for the famous country : County and it works too, if the country is a substring of several states or provinces or regions , it would aggregate them, be cautious
yes @JiPiBi you are right i need to write that down somewhere otherwise it could bring some people into mistake! by the way i just moved the script to a new repo, somehow i am going to make it automatically updated so it will creates these graph every day
@aster94 Please , what do you mean by automatically updated : No human intervention at all ? I just read that in google , but not so clear for me https://support.glitch.com/t/tutorial-how-to-auto-update-your-project-with-github/8124
If possible, I would be interested to read your script to understand that possibility For the moment, I have to pull the data to my laptop and then open my jupyternotebook
@aster94
All the countries that are split into provinces don't seem to have accurate dates in the graph.
@JiPiBi have a look in my repo, i pubblished the python script to do it (it needs very little human intervention)
@PaFiK1999 i think that this is up to the maintainers of this repo
@aster94 I'm back :-) I read your autopush.py code in which I suppose you fetch your data I suppose that, at the minimum, you have to run that file
But as I'm not so familiar with github possibilities, and even if it is a stupid question, I dare to ask : do you run the code from your desktop or laptop and commit the result in github or directly run in github ?
yes i just run the file from my laptop and it creates the graph and push all the data to github making a commit
@aster94 in the end I have worked on the idea of "country scoring", and decided that I didn't want to calculate an arbitrary score using whatever formula. There are too many concurring variables here and the result would very likely be a flawed score, in my opinion
But I did something else: starting from the collected samples I have implemented polynomial fitting on the datapoints. This is, in practice, logistic regression that captures the most close polynomial modeling the data samples
From this polynomial, I extract and draw its second derivative. This allows to further calculate:
Inflection points especially are very interesting, and we know that they happen when the second derivative crosses zero. If it crosses zero heading upwards, the "original" polynomial trend is "increasing". If it crosses zero heading downwards, the trend is "decreasing"
I have seen that this correctly captures trends that are not visible to the naked eye, of course there is nothing guaranteeing that these small trends are stable, and they could vary the next day, but still they would be captured correctly even "tomorrow", and especially they are a fact, and not an arbitrary prediction
An example of such "advanced" charts that you can now easily produce using my code would be the following (here the trend is obvious and my approach is of course useless, but for many countries these days we are in a less obvious situation):
As you told me you would have been interested in my work on the scoring aspect of the problem, I wanted to send you an update :)
My repo is here:
Thanks for sharing your approach @r-lomba it is indeed very interesting and the trends seems to reflect the real worl. Unfortunately since this repo seems to not following a constant practise over country names and update of the data making graph out if this is more difficult day after day
Hello, I was curious about the day-by-day situation of covid in my country so i wrote a simple python script that using this repo make a multi-bar graph. For every day there are 3 bars: blue for new cases, green for recovered and red for deaths. Here the output of the program, you may find this interesting:
Also i made a dirty attempt to create a "score" to measure how well a country is responding to the COVID emergency (the highter the better):
Basically the score is:
(daily recovered / all positives) / ((daily confirmed / (population of the country / 10000)) * (daily deaths / all positives))
code and updated graph moved to a repo: https://github.com/aster94/COVID-19