csgillespie / poweRlaw

This package implements both the discrete and continuous maximum likelihood estimators for fitting the power-law distribution to data. Additionally, a goodness-of-fit based approach is used to estimate the lower cutoff for the scaling region.
109 stars 24 forks source link

estimates comparison OLS, MLE + bootstrapping log-normal #43

Closed glennmagerman closed 9 years ago

glennmagerman commented 9 years ago

Hi Collin, first off: many thanks for your work. this is amazing!

2 (probably very silly questions):

  1. running ols and mle on the same data, gives an estimate of alpha which is 1 off. I'm confused: both estimate the scale parameter and use the pdf, right? So for a given dataset, I get OLS estimate = -1.3, and MLE with same x_min = -2.26. I looked at the Clauset paper, but I seem to miss where I go wrong.
  2. Trying to bootstrap the Kolmogorov-Smirnov test for the powerless hypothesis. but now for a log-normal distribution. Unfortunately, every time R crashes, both on a server and on a laptop.

All the best, Glenn

csgillespie commented 9 years ago

Hi Glenn.

  1. Have you tried simulating a data set, .i.e. rpldis(4000, 1, 2) and trying OLS. That way you know the correct answer.
  2. Can you provide a sample data set that reproduces this problem.

Thanks

glennmagerman commented 9 years ago

Hi Collin,

many thanks for your quick response!

  1. the OLS estimate is indeed close to the MLE estimate -1 in absolute terms. Still confused by the interpretation from the CDF or PDF... Do I miss something very, very basic?
  2. I'm afraid I cannot sample some data for NDA purposes, but below is the code I'm running, and I can bootstrap the power law, but not the log-normal. I even set # sims as low as 10 for testing purposes. Both server and laptop freeze and have to reboot... I will try again with some other data and keep you posted...
#fit log-normal
m_ln = conlnorm$new(data)
est = estimate_xmin(m_ln)
m_ln$setXmin(est)

# bootstrap fit
bs_ln = bootstrap_p(m_ln, no_of_sims=10, threads=8, seed=1)
csgillespie commented 9 years ago

One thought I had was that the problem occurs when you generate random numbers. I implemented a simple rejection algorithm:

  1. Simulate N random numbers
  2. Reject values less than xmin.

To make this a bit more efficient, I calculate the expected number of rejections and compensate by simulating a larger N. However, this may make N very large. To test this idea, what are your parameter and xmin values in est.

glennmagerman commented 9 years ago

Hi Collin,

sorry for the very late reply. I got things to converge by rescaling them. I had sales in euros, and converted those to millions of euros, and everything goes smooth... Very strange though, since I know of some non-linear estimators that are not scale-invariant (e.g. negative binomial), but I would suspect that a scale free distribution would also have scale-invariant estimators if all goes well? This reminds me of some estimations with poisson pseudo maximum likelihood, where convergence is not achieved for very large values of x, and rescaling can help for convergence. however, estimates are the same, independent of scale (and absorbed in the constant of the regression for example) Do you think that is what is going on here, maybe through the calculation of the normalizing constant ? If that would be true, I think this deserves some more investigation in general for the MLE... It seems far-fetched in this setting, but maybe the key lies in the normalizing constant. Best, Glenn

csgillespie commented 9 years ago

I'm glad you solved your problem. To be honest, I'm not entirely sure what's going on. Sorry.