csgillespie / poweRlaw

This package implements both the discrete and continuous maximum likelihood estimators for fitting the power-law distribution to data. Additionally, a goodness-of-fit based approach is used to estimate the lower cutoff for the scaling region.
109 stars 24 forks source link

Uncertainty of alpha #27

Closed haikolietz closed 10 years ago

haikolietz commented 10 years ago

I've noticed that the estimated alpha is often above the mean or median of the bootsrapped alphas. Why is that? Because drawing from the original data with replacement tends to miss the extreme events?

Also, there are several ways to present the uncertainty of alpha after the bootstrapping. For example, for one of my distributions with alpha = 2.64 (P>0.1), there are the following options for 1000 bootstraps:

mean and standard error: 2.27+-0.00 mean and 95% confidence interval: 2.27+-0.01 mean and standard deviation: 2.27+-0.12 median and percentiles [0.025,0.975]: 2.25 [2.15,2.63]

Is there a standard or should one be preferred for statistical reasons? If you ask me, it's a judgement call...

csgillespie commented 10 years ago

Sorry, this issue came when I was away and slipped down my list.

Regarding your question. Intervals 1 & 2 are focusing on the sampling variability of the mean, whereas the intervals 3 & 4 are focusing on the overall distribution. There isn't a "correct one". It just depends on what you want to show.

csgillespie commented 10 years ago

I should mention that I only provided the uncertainty regions to give users and idea of how many bootstraps were needed. I'm not sure I would directly report them in a manuscript.

haikolietz commented 10 years ago

I was thinking about using the uncertainty from bootstrapping as an alternative to the deterministic way of getting the standard error for the estimated alpha (formulas 3.2 and 3.6 in http://arxiv.org/pdf/0706.1062.pdf). Why would you rather not do that? And if I stick to the deterministic way, do you know how I can call the generalized zeta function in R?

csgillespie commented 10 years ago

The formulas 3.2 and 3.6 in the paper relate to the case for a given xmin. The bootstrap procedure takes into account the uncertainty in both alpha and xmin.

The VGAM package has a zeta function.