YiranJing / Coronavirus-Epidemic-COVID-19

👩🏻‍⚕️Covid-19 estimation and forecast using statistical model; 新型冠状病毒肺炎统计模型预测 (Jan 2020)
243 stars 69 forks source link

IncorrectCIresult #1

Closed YiranJing closed 4 years ago

YiranJing commented 4 years ago

The calculation of 95% CI is incorrect, compared to paper's output

def calculate_conf_interval(self, alpha=0.05):
        """
        Calculate confidence interval by MLE:
           Suppose population follows binomial distribution Bin(p,N)
           p: probability anyone case will be detected overseas
           N: negative binomially distributed function of X (number of cases detected outside mainland China)
        """
       pass 

See notebook: https://github.com/YiranJing/Coronavirus-Epidemic-2019-nCov/blob/master/Estimating%20current%20cases%20in%20Wuhan.ipynb

Corresponding description in the paper:

Confidence intervals can be calculated from the observation that the number of cases detected overseas, X, is binomially distributed as Bin(p,N), where p = probability any one case will be detected overseas, and N is the total number of cases. N is therefore a negative binomially distributed function of X. The results in Table 1 are maximum likelihood estimates obtained using this negative binomial likelihood function.

I am confused about how to use MLE in the negative binomial likelihood function.

I currently tried two ways:

def calculate_conf_interval(self, alpha=0.05):
        # based on formula mensioned in paper
        p = self.calculate_pro_detected_overseas()  # 0.0017373684210526318
        # the estimated number based on formula mensioned in paper
        n = self.calculate_total_cases_inWuhan()  # 4029
        # our sample is number of cases detected overseas
        k = self.international.cases  # 7 
        mean, var, skew, kurt = nbinom.stats(k, p, moments='mvsk')

        # Lower bound of 95% density dist
        lb = nbinom.ppf((alpha/2), k, p)  
        # Upper bound of 95% density dist
        ub = nbinom.ppf((1-alpha/2), k, p) 
        return (lb, ub) # incorrect !!! it is density dist, not the dist of mean 

or

def calculate_conf_interval(self, alpha=0.05):
        p = self.calculate_pro_detected_overseas()
        n = self.calculate_total_cases_inWuhan()
        # our sample is number of cases detected overseas
        k = self.international.cases

        mean, var, skew, kurt = nbinom.stats(k, p, moments='mvsk')

        margin = stats.norm.ppf(1-alpha/2)*np.sqrt(var)
        return (mean-margin, mean+margin) # traditional way of CI calculation
YiranJing commented 4 years ago

Resolved by using profile likelihood method