Open tle4336 opened 2 days ago
Could anyone please help with the above question?
reading the code, categorical inputs are transformed into one-hot columns, and the mean of that column from the training set is subtracted, then betas are applied.
@CamDavidsonPilon Thank you very much for your help with my question, really appreciate your help. From your answer, I have two quick clarification questions:
Is the mean of categorical-input column the same as the mean obtained from the method norm_mean
of a trained CphFitter model? For numerical-input columns, these two are the same, but I just want to ensure it remains that way for categorical.
When subtracting from the mean of that column from the training set, I understand the code just did (1 - mean)
and (0 - mean)
, rather than take the raw value of the original categorical-input column and subtract from the mean of the corresponding transformed one-hot column (e.g. xi_{categorical} - mean
). Can you please confirm if this is the case?
Does anyone happen to know the formula that is used in
predict_partial_hazard
function of the classCoxPHFitter
when the features have some categorical variables, each of which might have at least3
values (e.g. IDs, day of week)?