flexsurvreg() difficult to converge with WeibullPH

jintaili commented 4 years ago

Hi! I am trying to estimate a (locally) proportional hazard model with Weibull (or other parametric) baseline hazard function, and with time-dependent coefficients. Therefore I am very happy to find that flexsurvreg() is capable of doing that. The reason for choosing a parametric baseline over the Cox model was for ease of prediction, and the reason for choosing time-dependent coefficients / local proportionality over other proportionality relaxations was for ease of interpretation of the parameters.

However, it does seem like the algorithm does not converge very easily with WeibullPH. Specifically I have created my interval data as follows:

cutoffs = c(4, 21, 60)

x_train = survSplit(Surv(t_end, event) ~ ., 
    data, cut = cutoffs,
    episode = 'g') %>% data.table()

srv = Surv(tstart, t_end, event)

This data is then estimated using the following model:

formula = as.formula('srv ~ I(x1 * (g == 1)) + I(x1 * (g == 2)) + I(x2 * (g == 1)) + I(x2 * (g == 2))')

pw_flex = flexsurvreg(formula, 
    anc = NULL,
    data = x_train, 
    dist = "weibullPH",
    inits = rep(0.01, 2 + max(x_train$g) * (ncol(x_train) - 5)),
    method = "Nelder-Mead"
    )

In the implementation I had substantially more covariates. (I had df = 126 for around 150,000 observations.) During the estimation, the algorithm threw out many Warning in dweibull(x, shape = shape, scale = scale^{ : NaNs produced warnings, and finally failed to converge. I also tried the default BFGS optimiser, as well as the PH generalized gamma model provided in vignette, with no success.

I was wondering if you had experience with what setup could result in a better convergence, what the parameters are that I could play with, and more broadly if there was better ways to achieve what I am trying to do. Thanks!

chjackson commented 4 years ago

I'm not sure how much help I can be - so I'd invite anyone else who comes across this to contribute. The package uses generic optimisers that are not tuned to any specific problem.

I'd just say that such a large parameter space will be challenging to any generic optimiser. First thing I'd try with such a large dataset would be the control=list(fnscale=... option to optim(), where fnscale is set to a number near the size of the -2 log likelihood.

The NaN warnings are from the optimiser visiting implausible values, and then moving on.

With lots of covariates, I wonder if penalised / regularised estimation would work better, but I don't know what the best software for that would be. Stan perhaps, though I haven't used it in problems like these.

jintaili commented 4 years ago

Thanks @chjackson ! I am rerunning the estimation with fnscale specified and waiting for the results. I do have two follow-up questions:

I did some maths based on WeibullPH code, and it seemed the hazard function used here was h(t) = a*b*t^(a-1) where a is the shape and b is the scale. In the documentation it is mentioned that flexsurvreg() enters the covariates into the scale parameter through a linear model, or a log-linear model. I am trying to figure out how in this case proportionality was preserved. Could you specify how the covariates are placed functionally here and maybe how to play with it?
Trying to understand how to set fnscale. I was able to estimated a PH Weibull baseline model in eha (but there is no predict method for phreg...) and the final loglikelihood was a bit more than -15000. Does that indicate I should specify fnscale to be strictly a bit more than -2*logLik? References on how to tune optim would be appreciated too - I wonder if the abstol and reltol could be of some use, too.

chjackson commented 4 years ago

It's just b = exp(beta1*x1 + beta2*x2 + ...). Not sure what you mean by "play with it"?
The exact value of fnscale doesn't matter as long as it's a number of the same order of magnitude. The idea is that it makes the optimiser less likely to overflow or underflow if the objective function returns a value that's in the middle of the range of numbers that computers can represent. See also parscale that does something similar with the parameter values. I don't know any better reference than help(optim) - I don't have a very deep understanding of optimisation and floating point issues.

jintaili commented 4 years ago

Thanks! Wrt 1.: for some reason I had thought covariates entered the other parameter a and was trying to figure out how that would work. This makes perfect sense.

Wrt 2.: That sounds good - I will try to reduce the degree of freedom, fewer samples and other hazard functional forms. For parscale I have normalised all (except boolean) covariates prior to estimation and I assume that would render it not so useful? So far my model as written in the original post still eventually gives Hessian not positive semidefinite error.

Will update if I found a solution!

chjackson commented 4 years ago

Optionally you can have covariates on the other parameter a as well. The equation would be of the same form as the one for b above.

RAZII-we commented 2 years ago

please someone help me, why im facing this error in R

m=10000; n=20; k=m*n; theta=1; lambda=1 x=t=d=y=ys=numeric(k) for (i in 1: k)

{

x[i] <- rweibullPH(1, shape = 2, scale = theta)

t[i] <- rweibullPH(1, shape = 1, scale = lambda)

d[i] <- if(x[i]<=t[i]) 1 else 0

y[i] <- d[i]x[i]+(1-d[i])t[i]

ys[i] <- y[i]*y[i]

} Error in rweibullPH(1, shape = 2, scale = theta) : could not find function "rweibullPH"

chjackson commented 2 years ago

@RAZII-we You should do library(flexsurv) before calling rweibullPH

chjackson / flexsurv

flexsurvreg() difficult to converge with WeibullPH #69