CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.38k stars 560 forks source link

Obtain log-likelihood given parameters #1582

Closed melodiemonod closed 1 year ago

melodiemonod commented 1 year ago

Dear lifelines team,

Is it possible to use your build-in fcts to obtain the log-likelihood given some data and a set of parameters?

For example, suppose I have a data set and some scale/shape parameters obtained from another model, would it be possible to get the log-likelihood of the Weibull parametric model?

The further I got for now is to re-fit the the data

# create Weibull fit instance
aft = WeibullAFTFitter()
aft.fit(df, duration_col='duration', event_col='event', fit_intercept = False)

then overwrite the parametera

# overwrite parameters
aft.params_.lambda_ = np.log(params['scale'])
aft.params_.rho_ = np.log(params['concentration'])

and then get the average log-likelihood

# get average log-likelihood
print(aft.score(df.drop(['id'], axis=1), scoring_method="log_likelihood")) 

Is there any smarter way to do this?

Many thanks

CamDavidsonPilon commented 1 year ago

Hi @melodiemonod,

There is a function _log_likelihood_right_censoring on the models you could use: https://github.com/CamDavidsonPilon/lifelines/blob/master/lifelines/fitters/__init__.py#L1412

Due to my lack-of-wisdom, I didn't annotate what the function accepts, so lets do that here:

params: a dict, with keys representing parameters (ex: "lambda") and values are numpy arrays. Ts: a tuple of representing the time bounds we have: (numpy array or none, numpy array or none). For right censoring data, the 0th position contains the T data. E: a numpy array representing censoring W: weights, a numpy array of weights, probably just all 1s. entries: late entries, probably just all 0s Xs: a lifelines.utils.DataframeSlicer. The input to this object is a dataframe, but that dataframe needs to be have a multi-dimension column such that `df['lambda']returns a dataframe of the covariates needed in the linear model forlambda`

melodiemonod commented 1 year ago

Cameron, thank you so much for your quick and helpful response!

Here's my updates code to find the log likelihood of the Weibull survival model given observation-specific param $\lambda_i$ and fixed concentration param $\rho = 1$.

def WeibullLogLikelihood(
        self,
        params,
        status,
        time,
 ):

    # number of observations
    n_samples = len(status)

    # Define response inputs for log_likelihood function
    Ts = (np.int_(time), None)  # No left-censoring, so the second element is None
    E = np.copy(status) # Boolean indicating whether event happened
    W = np.ones(n_samples)   # Assuming equal weights 
    entries = np.zeros(n_samples)  # No late entries

    # Define covariate inputs for log_likelihood function
    x = pd.DataFrame(np.eye(n_samples, dtype=int), columns=[f'x_{i+1}' for i in range(n_samples)]) # diagonal matrix for patient-specific lambda
    ones_column = pd.DataFrame(np.ones((n_samples, 1)), columns=['Intercept'])  # Intercept column for rho
    Xs = pd.concat([x, ones_column], axis=1, keys=['lambda_', 'rho_']) # concatenate

    # Define parameteter inputs for log_likelihood function
    log_lambda_ = pd.DataFrame({'covariate':  {'x_' + str(i + 1): np.float64(params[i]) for i in range(0, n_samples)}}, 
                                                index=pd.Index(['x_' + str(i + 1) for i in range(n_samples)])).squeeze() # scale parameter
    log_rho_ = pd.DataFrame({'covariate': {'Intercept': np.float64(0)}}).iloc[0, 0].reshape((1,)) # shape parameter fixed to 1 (=exp(0))
    params_ = {'lambda_': log_lambda_, 'rho_':log_rho_}

    # log likelihood
    wbf = WeibullAFTFitter()
    ll_right_censoring_average = wbf._log_likelihood_right_censoring(params_, Ts, E, W, entries, DataframeSlicer(Xs))
    ll_right_censoring = ll_right_censoring_average * n_samples

return ll_right_censoring
CamDavidsonPilon commented 12 months ago

Incredible! You took my scrappy and vague internal code and used it perfectly!