Closed yijunwang0805 closed 4 years ago
I was thinking, given that wald test statistic is given by
If
is essentially a zero matrix, that is, the parameters of x1 and x2 are zero, then it is dividing by zero. Hence, it yields the error.
However, if this is true, ideally, we should have just failed to reject the null hypothesis that
instead of getting the error.
What appears to be happening here is that your model has no degrees of freedom left and so df_denom
is 0. This is a bug and should be caught so that an InvalidTestStatistic can be returned rather than a WaldTestStatistic.
If you had a richer dataset you would not see this issue.
Hi Keven,
Thank you for the suggestion.
Since the degree of freedom is calculated by df = N - 1
where N
is the sample size, I look up my sample size.
The dataset is 7276 rows × 5 columns, and below are the counts for each group
city-province
(akesu, xinjiang) 11
(ankang, shaanxi) 30
(anqing, anhui) 30
(anshan, liaoning) 19
(haibei, qinghai) 9
(haikou, hainan) 32
(handan, hebei) 23
(hangzhou, zhejiang) 31
(hanzhong, shaanxi) 29
(hebi, henan) 23
(hechi, guangxi) 30
(hefei, anhui) 30
(hegang, heilongjiang) 20
(heihe, heilongjiang) 18
(hengshui, hebei) 22
(hengyang, hunan) 27
(heyuan, guangdong) 23
(heze, shandong) 34
(hezhou, guangxi) 21
(honghe, yunnan) 8
(huaibei, anhui) 27
(huaihua, hunan) 29
(huainan, anhui) 26
(huanggang, hubei) 35
(huangshan, anhui) 24
(huangshi, hubei) 32
(huizhou, guangdong) 38
(huludao, liaoning) 31
(huzhou, zhejiang) 25
(jiamusi, heilongjiang) 25
(jiangmen, guangdong) 29
(jiaozuo, henan) 23
(jiaxing, zhejiang) 30
(jieyang, guangdong) 25
(jilin, jilin) 15
(jinan, shandong) 34
(jinchang, gansu) 19
(jincheng, shanxi) 18
(jingdezhen, jiangxi) 26
(jingmen, hubei) 35
(jingzhou, hubei) 35
(jinhua, zhejiang) 32
(jining, shandong) 34
(jinzhong, shanxi) 26
(jinzhou, liaoning) 28
(jiujiang, jiangxi) 31
(jixi, heilongjiang) 24
(kaifeng, henan) 23
(kunming, yunnan) 39
(laibin, guangxi) 13
(langfang, hebei) 24
(lanzhou, gansu) 32
(leshan, sichuan) 24
(liangshan, sichuan) 4
(ningde, fujian) 32
(panjin, liaoning) 31
etc
Ideally the sample size should be enough.
But just in case, I removed the (liangshan, sichuan)
since it only has 4 observations df = df.drop(df[(df.city == 'liangshan') & (df.province == 'sichuan')].index)
. It worked!
I am curious, what about (tacheng, xinjiang)
? It only has 2 observations.
I re-do the entire jupyternote book from the start. The same things which did not work a day ago and two days ago started to work.
I guess sometimes I just need to restart the laptop.
Thank you!
Yijun
Hi,
It is me again. Thank you for reading this issue.
From #246 we know that we have a dataframe,
When I try to run the panel regression
There is an error message
Intuitively, this means it was dividing by zero
df_denom
.I looked up stack overflow and there was an answer says change the number to float, which I did, but did not work.
I looked at my dataframe to see if I had any missing values
dataframe.isnull().sum().sum()
and there isn't anyI added numeric values by 0.1
df['y'] = df['y'] + 0.1
df['x1'] = df['x1'] + 0.1
df['x2'] = df['x2'] + 0.1
and also did not workI wonder what have I done incorrectly. (From my actual dataframe, I have the same issue as illustrated in this example)
Thank you for your time.
Yijun Wang