Closed lsaravia closed 9 years ago
I agree it doesn't seem quite right. However, a quick simulation:
library("poweRlaw")
x = sort(rplcon(10, 1, 2))
m = conpl$new(x)
m$setPars(2)
q = m$dat
n = m$internal[["n"]]; N = length(q)
q = q[(N-n+1):N]
fit_cdf = dist_cdf(m, q)
data_cdf = ((0:(n-1))/n)[1:length(fit_cdf)]
tmp = ecdf(x)
tmp(x)
shows that it does the correct thing. I got the idea from the R/Matlab code on Clausett's website.
One small issue I seem to recall is that if you have duplicates in the data the empirical cdf isn't quite correct, but then I suppose if the data is really CTN you shouldn't get duplicates.
I checked the plpva.r code and does the same, I am not sure why, I think that is the way you calculate dist_cdf, I you uncomment the plots you will see the differences with the empirical cumulative distribution. I calculated the distances using the builting ks.test function and checked, not quite wrong. But the problem I had is with GOF, I will open another issue for that.
comp_ks <-sapply(1:50, function(x){
x <- sort(rplcon(50, 1, 2))
m <- conpl$new(x)
m$setPars(2)
q <- m$dat
n <- m$internal[["n"]]; N = length(q)
q <- q[(N-n+1):N]
fit_cdf <- dist_cdf(m, q)
data_cdf <- ((0:(n-1))/n)[1:length(fit_cdf)]
tmp <- ecdf(x)
#plot(tmp,col=5,pch=21)
#points(tmp(x),col=2)
#points(fit_cdf,col=3)
#curve(pplcon(x,m$xmin,est$pars),min(x),max(x),add = T,col=4)
ks <-ks.test(x,"pplcon",1,2)$statistic
pl_ks <-max(abs(fit_cdf-data_cdf))
c(ks,pl_ks)
})
plot(comp_ks[1,],comp_ks[2,])
For info I'm away for the next 10 days.
The difference is due to the comparison. Spefically, are we comparing with Pr(X > x) or with Pr(X \ge x).
In your example above, if use subtract a very small amount from with using the ecdf we get the same result.
data_cdf
tmp(x)
tmp(x-.Machine$double.eps)
We were looking at your functions to understand how they work and noticed that in the continuous case the function get_KS_statistic calculates the KS distance between the estimated distribution and a uniform cdf instead of the empirical cdf, which seems not quite right.