CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.38k stars 560 forks source link

Formula documentation for `predict_partial_hazard` function with categorical features #1645

Open tle4336 opened 2 days ago

tle4336 commented 2 days ago

Does anyone happen to know the formula that is used in predict_partial_hazard function of the class CoxPHFitter when the features have some categorical variables, each of which might have at least 3 values (e.g. IDs, day of week)?

tle4336 commented 1 day ago

Could anyone please help with the above question?

CamDavidsonPilon commented 4 hours ago

reading the code, categorical inputs are transformed into one-hot columns, and the mean of that column from the training set is subtracted, then betas are applied.

tle4336 commented 4 hours ago

@CamDavidsonPilon Thank you very much for your help with my question, really appreciate your help. From your answer, I have two quick clarification questions:

  1. Is the mean of categorical-input column the same as the mean obtained from the method norm_mean of a trained CphFitter model? For numerical-input columns, these two are the same, but I just want to ensure it remains that way for categorical.

  2. When subtracting from the mean of that column from the training set, I understand the code just did (1 - mean) and (0 - mean), rather than take the raw value of the original categorical-input column and subtract from the mean of the corresponding transformed one-hot column (e.g. xi_{categorical} - mean). Can you please confirm if this is the case?

CamDavidsonPilon commented 4 hours ago
  1. Yes,
  2. I don't understand your question