Rank23 / COVID19

Using Kalman Filter to Predict Corona Virus Spread
https://medium.com/@rank23/using-kalman-filter-to-predict-corona-virus-spread-72d91b74cc8
MIT License
80 stars 38 forks source link

KeyError: 'Province/State' #7

Open adityacloud1 opened 4 years ago

adityacloud1 commented 4 years ago

Respected Sir, I tried to run yours filter for my dataset and getting following error while running #import pandas as pd confirmed=pd.merge(confirmed, population,how='left' ,on=['Province/State','Country']) death=pd.merge(death, population,how='left' ,on=['Province/State','Country']) recover=pd.merge(recover, population,how='left' ,on=['Province/State','Country']) confirmed.head() error Could, you please suggest me how to resolve this error

Rank23 commented 4 years ago

Try to run: confirmed=pd.merge(confirmed, population,how='left' ,on=['Province/State','Country/Region']) and make sure the 'population' df columns name are: Province/State | Country/Region | Population

adityacloud1 commented 4 years ago

Thanks sir for yours kind response.

adityacloud1 commented 4 years ago

Respected Sir, When I tried to run following sub module, after execution it always gives error "division by zero" error. I tried for each input parameter eg. China, Japan, etc of Country/Region column. `t['1_day_change']=t['3_day_change']=t['7_day_change']=t['1_day_change_rate']=t['3_day_change_rate']=t['7_day_change_rate']=t['last_day']=0 for i in range(1,len(t)): if(t.iloc[i,1] is t.iloc[i-2,1]): t.iloc[i,3]=t.iloc[i-1,2]-t.iloc[i-2,2] t.iloc[i,6]=(t.iloc[i-1,2]/t.iloc[i-2,2]-1)100 t.iloc[i,9]=t.iloc[i-1,2] if(t.iloc[i,1] is t.iloc[i-4,1]): t.iloc[i,4]=t.iloc[i-1,2]-t.iloc[i-4,2] t.iloc[i,7]=(t.iloc[i-1,2]/t.iloc[i-4,2]-1)100 if(t.iloc[i,1] is t.iloc[i-8,1]): t.iloc[i,5]=t.iloc[i-1,2]-t.iloc[i-8,2] t.iloc[i,8]=(t.iloc[i-1,2]/t.iloc[i-8,2]-1)100 t=t.fillna(0)
t=t.merge(temp[['date','region', 'X']],how='left',on=['date','region']) t=t.rename(columns = {'X':'kalman_prediction'}) t=t.replace([np.inf, -np.inf], 0) t['kalman_prediction']=round(t['kalman_prediction']) train=t.merge(confirmed[['region',' Population ']],how='left',on='region') train=train.rename(columns = {' Population ':'population'}) train['population']=train['population'].str.replace(r" ", '') train['population']=train['population'].str.replace(r",", '') train['population']=train['population'].fillna(1) train['population']=train['population'].astype('int32') train['infected_rate'] =train['last_day']/train['population']
10000 train=train.merge(w,how='left',on=['date','region']) train=train.sort_values(['region', 'date'])

division zero

Rank23 commented 4 years ago

You should check why you get zero values in t.iloc[,2]. If for some reason you get zeros, you can replace them with the actual values.

adityacloud1 commented 4 years ago

Dear Sir, For the above issue the values of t.tail(7) are highlighted in the following screenshot: t tail

Can you please share yours data set for Confirmed.csv, Deaths.csv and Recovered.csv.

Rank23 commented 4 years ago

The updated data sets were taken from the following URLs:

url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'

url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv'

confirmed = pd.read_csv(url, error_bad_lines=False)

url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv'

url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv' death = pd.read_csv(url, error_bad_lines=False) url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv' recover = pd.read_csv(url, error_bad_lines=False)

adityacloud1 commented 4 years ago

Dear Sir, I followed yours valuable suggestion and above errors were resolved. However, for other different inputs like 'India,' 'Korea, South' the prediction output always gives NAN result. This is shown in the following screenshot Kalam India And for Korea, South Kalam prediction south korea

adityacloud1 commented 4 years ago

Dear Sir, Greeting of the Day !! Please suggest solution for the above issue.

Rank23 commented 4 years ago

It seems like you don't get predictions from R script. You can share your code with me and I'll try to assist. rank23@gmail.com