This package implements both the discrete and continuous maximum likelihood estimators for fitting the power-law distribution to data. Additionally, a goodness-of-fit based approach is used to estimate the lower cutoff for the scaling region.
I am trying to understand how the conlnorm function estimates the parameters for the truncated lognorm, and whether this truncation is correct. I apologize in advance if I misuse some terminology with methods/classes/objects as I'm not entirely familiar with it.
In the MLE method for conlnorm, I see that truncation is performed with a greater than comparison, but for conpl truncation is performed with a greater than or equal to comparison. So if one were to set the conlnorm xmin to be min(dat), it will actually discard the minimum value, while conpl will include the minimum value, correct? It seems this should be a greater than or equal to comparison, so xmin should be included, unless I am missing some of the statistical background here. The dislnorm uses a greater than but uses: x <- x[x > xmin - 0.5].
For comparison, I tried to fit a left-truncated lognormal using truncdist and fitdist, and it seems they use a >= operator. The difference in the parameters by excluding that xmin is large in this case.
The real data from my problem is below. It's technically discrete, with values corresponding to k*900 for k >= 4, but represents a continuous variable
powerlawissue.txt
library(truncdist)
library(fitdistrplus)
library(poweRlaw)
dtruncated_log_normal <- function(x, meanlog, sdlog) {
dtrunc(x, "lnorm", a=3600, meanlog=meanlog, sdlog=sdlog)
}
ptruncated_log_normal <- function(q, meanlog, sdlog) {
ptrunc(q, "lnorm", a=3600, meanlog=meanlog, sdlog=sdlog)
}
xdat <- read.table("powerlawissue.txt", header = F)
fitdistrplus:::fitdist(xdat, "truncated_log_normal", start = list(meanlog=mean(log(xdat)), sdlog=sd(log(xdat))))
Fitting of the distribution ' truncated_log_normal ' by maximum likelihood
Parameters:
estimate Std. Error
meanlog 7.944887 0.18973782
sdlog 2.443924 0.07699878
mperm <- conlnorm$new(xdat)
mperm$setXmin(3600)
estperm2 <- estimate_pars(mperm)
mperm$setPars(estperm2)
> mperm
Reference class object of class "conlnorm"
Field "xmin":
[1] 3600
Field "pars":
[1] 9.548856 1.820164
Field "no_pars":
[1] 2
>
mperm <- conlnorm$new(xdat)
mperm$setXmin(3599)
estperm2 <- estimate_pars(mperm)
mperm$setPars(estperm2)
> mperm
Reference class object of class "conlnorm"
Field "xmin":
[1] 3599
Field "pars":
[1] 7.946124 2.442947
Field "no_pars":
[1] 2
For reference:
MLE for conlnorm:
new("refMethodDef", .Data = function (set = TRUE, initialise = NULL)
{
x = dat
x = x[x > xmin]
if (is.null(initialise))
theta_0 = c(mean(log(x)), sd(log(x)))
else theta_0 = initialise
negloglike = function(par) {
r = -conlnorm_tail_ll(x, par, xmin)
if (!is.finite(r))
r = 1e+12
r
}
mle = suppressWarnings(optim(par = theta_0, fn = negloglike,
method = "L-BFGS-B", lower = c(-Inf, .Machine$double.eps)))
if (set)
pars <<- mle$par
class(mle) = "estimate_pars"
names(mle)[1L] = "pars"
mle
}, mayCall = character(0), name = "mle", refClassName = "conlnorm",
superClassMethod = "")
MLE for conpl:
new("refMethodDef", .Data = function (set = TRUE, initialise = NULL)
{
n = internal[["n"]]
if (is.null(initialise)) {
slx = internal[["slx"]]
theta_0 = 1 + n * (slx - log(xmin) * n)^(-1)
}
else {
theta_0 = initialise
}
x = dat[dat >= xmin]
negloglike = function(par) {
r = -con_pl_ll(x, par, xmin)
if (!is.finite(r))
r = 1e+12
r
}
mle = suppressWarnings(optim(par = theta_0, fn = negloglike,
method = "L-BFGS-B", lower = 1))
if (set)
pars <<- mle$par
class(mle) = "estimate_pars"
names(mle)[1L] = "pars"
mle
}, mayCall = character(0), name = "mle", refClassName = "conpl",
superClassMethod = "")
I am trying to understand how the conlnorm function estimates the parameters for the truncated lognorm, and whether this truncation is correct. I apologize in advance if I misuse some terminology with methods/classes/objects as I'm not entirely familiar with it.
In the MLE method for conlnorm, I see that truncation is performed with a greater than comparison, but for conpl truncation is performed with a greater than or equal to comparison. So if one were to set the conlnorm xmin to be min(dat), it will actually discard the minimum value, while conpl will include the minimum value, correct? It seems this should be a greater than or equal to comparison, so xmin should be included, unless I am missing some of the statistical background here. The dislnorm uses a greater than but uses: x <- x[x > xmin - 0.5].
For comparison, I tried to fit a left-truncated lognormal using truncdist and fitdist, and it seems they use a >= operator. The difference in the parameters by excluding that xmin is large in this case.
The real data from my problem is below. It's technically discrete, with values corresponding to k*900 for k >= 4, but represents a continuous variable powerlawissue.txt
For reference: MLE for conlnorm:
MLE for conpl: