Differences in estimation with a big dataset

csgillespie / poweRlaw

This package implements both the discrete and continuous maximum likelihood estimators for fitting the power-law distribution to data. Additionally, a goodness-of-fit based approach is used to estimate the lower cutoff for the scaling region.

109 stars 24 forks source link

Differences in estimation with a big dataset #45

Closed lsaravia closed 9 years ago

lsaravia commented 9 years ago

I am fitting power laws with a big (10^6) data set, I suspected that there was some problem with alpha, so I tried the pareto.R functions from Shalizi http://tuvalu.santafe.edu/~aaronc/powerlaws/, and it gives me different results

plawestimationcomp

the code I used is here

https://github.com/lsaravia/CriticalGlobalForest/blob/master/R/test_poweRlaw.r

csgillespie commented 9 years ago

Thanks for the feedback.

Could I get a copy of the date (simulated or otherwise)
Your url doesn't work.

Cheers

lsaravia commented 9 years ago

My mistake, I have used the function pareto.fit instead of zeta.fit, after correct that the results are identical. Anyway I wonder why if you look at the figure, the continuous power law seems to fit better.

The code is here

https://www.dropbox.com/sh/91ywo3x510z4ozk/AABafDJY9GQ9BM-sVDsNaXQca?dl=0

and the data

https://www.dropbox.com/s/u6zq51mqhnkqxiy/MOD44B.MRTWEB.A2010065.005.Percent_Tree_Cover.tif.bin?dl=0

Cheers!

csgillespie commented 9 years ago

Sorry for the delay. I responded a few days ago, but must have forgotten hit comment.

Anyway, I've noticed in types of data, the Clausett et al method doesn't work that well; it's the downside for having a nice flexible method.

I'm working on other methods, would it be possible to include you dataset in the paper as an example?

Thanks

lsaravia commented 9 years ago

Yes you could, but I guess that the problem is that the data is originally continuous (tree patch size), and it is discretized by the remote sensing image, so the continuous function fits better. Soon I will deposite the data in a public repository, in the meantime you could use it.

csgillespie commented 9 years ago

Great. Do you have a paper I could cite?

lsaravia commented 9 years ago

Not yet... but I will let you know

csgillespie commented 9 years ago

As an aside, there is a bug in http://tuvalu.santafe.edu/~aaronc/powerlaws/plfit.r when the data is discrete and xmin=1.