csgillespie / poweRlaw

This package implements both the discrete and continuous maximum likelihood estimators for fitting the power-law distribution to data. Additionally, a goodness-of-fit based approach is used to estimate the lower cutoff for the scaling region.
109 stars 24 forks source link

Error in estimate_xmin for conlnorm #93

Closed jcredberry closed 4 years ago

jcredberry commented 4 years ago

Colin, I am trying to run estimate_xmin with a dataset of 12 points and bootstrap (my implementation, not yours). Most of the times it works perfectly, but sometimes it crashes with the following message:

Error in optim(par = theta_0, fn = negloglike, method = "L-BFGS-B", lower = c(-Inf,  : 
  non-finite value supplied by optim
In addition: Warning messages:
1: In get_xmin_est(dat, xmins) :
  Unable to estimate xmin. This may be due to numerical instabilities.
            For example the parameter estimates are in the distribution tails.
2: In min(which(internal[["dat"]] >= (x - .Machine$double.eps^0.5))) :
  no non-missing arguments to min; returning Inf

The function that I am using with bootis:

sample_median_ln<-function(data,indices){
  require(poweRlaw)
  resample<-data[indices]
 # I had to add some jitter to reduce the problem to this single instance.
  resample<-abs(jitter(resample,factor = 1,amount = IQR(remuestra)/17))
  m_ln = conlnorm$new(resample)
  est = estimate_xmin(m_ln,xmins = 1e-4)
  m_ln$xmin<-est$xmin
  m_ln$pars<-estimate_pars(m_ln)
  return(exp(m_ln$pars[1]))
}

thisCents<- c(12.8690476, 0.2000000, 0.2000000, 1.6166667, 3.0000000, 0.0000000, 11.2404762, 15.4833333, 0.2000000, 16.3666667, 0.1666667)
bs_median<-boot(data = thisCents,statistic = sample_median_ln,R = 1000)
bs_median_ci<-boot.ci(bs_median,conf = .95,type = "bca")

With the following resample (found by hand) I am able to crash estimate_xmin:

resample<-c(0.1984895, 3.0046058, 12.8631459, 15.4773498, 0.2062829, 0.1996015,  0.1960452, 15.4788675, 0.1935871, 0.2033684, 0.1929931)

Using conpl it works, although the parameters have a large variance. Might this be the problem? As you can see most of the data is close to zero, then there is a huge jump to values larger than 10.

Please, I need help as this is crucial in my work. Otherwise, I will have to rely on semiparametric bootstrap, which I know will underestimate the median centrality of my nodes.

Regards, Julio

jcredberry commented 4 years ago

I have tried fitdistr with the problematic resample, and it "worked":

> fitdistr(resample,densfun = "lognormal")
    meanlog       sdlog   
  -0.1984425    1.9213726 
 ( 0.5793156) ( 0.4096380)
> exp(-0.1984425)
[1] 0.8200069
jcredberry commented 4 years ago

I am running both of them and due to the data, it seems that for some replicates fitdistr finds the first mode as the location parameter, but estimate_pars (I guess due to estimate_xmin) sometimes finds the second mode. The bootstrap_p function then does not reject the null hypothesis, although it is close to 7%. When both are somewhat closer, then the null is rejected. If I fix m_ln$min<-min(resample), then both estimations are close and the null hypothesis is close to be rejected, but it is never so (close to 6%).

Why is estimate_xmin so sensitive?

csgillespie commented 4 years ago

This

fitdistr(resample,densfun = "lognormal")

fits a lognormal distribution. Not a truncated lognormal.

When I run

library(poweRlaw)
x = c(0.1984895, 3.0046058, 12.8631459, 15.4773498, 0.2062829, 0.1996015,  0.1960452, 15.4788675, 0.1935871, 0.2033684, 0.1929931)
m = conlnorm$new(x)
estimate_xmin(m)

gives the warning

Warning message:
In get_xmin_est(dat, xmins) :
  Unable to estimate xmin. This may be due to numerical instabilities.
            For example the parameter estimates are in the distribution tails.

What this means is that values of the lognormal (for a particular value xmin) are very large.

Also

plot(m)

indicates the data is decidedly odd.