csgillespie / poweRlaw

This package implements both the discrete and continuous maximum likelihood estimators for fitting the power-law distribution to data. Additionally, a goodness-of-fit based approach is used to estimate the lower cutoff for the scaling region.
109 stars 24 forks source link

incorrect slope on displ plot? #70

Closed xerroxcopy closed 7 years ago

xerroxcopy commented 7 years ago

From what I've understood, function ppareto in Shalizi's code is similar to plotting displ function in poweRlaw package when it comes plotting CCDF. Here, I plotted pseudo distribution using ppareto, with log.p = F, lower.tail = F options, with alpha-value of 2. As code suggests, the power-law fit has 1 - alpha = -1 slope.

x <- seq(1, 100, length = 100)
exponent <- 2
y <- ppareto(x = x, threshold = 1, exponent = exponent, log.p =F, lower.tail = F)
quartz(type="pdf", file=paste("ppareto_test_",exponent, ".pdf", sep=""), width = 6, height = 6)
par(pty = "s")
plot(x, y, log = "xy", xlim = c(1,100), ylim = c(0.01, 1))
lines(x, y, col = "blue")
dev.off()

ppareto_test_2.pdf But when I plot the same thing in displ function,

pseudofreq <- c(rep(1, 99), 100)
displ.test <- displ$new(pseudofreq)
displ.test$setXmin(1)
displ.test$setPars(exponent)
quartz(type="pdf", file=paste("displ_test_",exponent, ".pdf", sep=""), width = 6, height = 6)
par(pty = "s")
plot(displ.test, xlim = c(1,100), ylim = c(0.01, 1))
lines(displ.test, col = "blue", xlim = c(1,100), ylim = c(0.01, 1))
dev.off()

displ_test_2.pdf

Two things are different:

  1. the slope is significantly steeper, and
  2. the line is not straight.

Why does this happen? Am I doing this wrong? Is displ function different from ppareto?

Also, I tried to fit power-law on randomly generated power-law distributed data:

data <- dist_rand(displ.test, 100)
estimate.displ <- displ$new(data)
estimate.displ$setXmin(1)
estimate.displ$setPars(estimate_pars(estimate.displ)$pars)
estimate.displ$pars # happens to be 1.972
plot(estimate.displ, xlim = c(1,100), ylim = c(0.01, 1)) 
lines(estimate.displ) # the slope seems to be around 1.045, not 0.972
df <- lines(estimate.displ) # plotting this on ggplot also produces bent curve, not straight line

displ_randtest_2.pdf

xerroxcopy commented 7 years ago

Update:

Turned out I was wrong. the function in Shalizi's codes equivalent of displ was pzeta, not ppareto. pzeta function actually plots identical to that of displ. I am sorry for the misunderstanding.