TL;DR: When using CoxPHFitter.fit(), it doesn't matter whether a value for robust is specified. If there's a cluster_col specified, then presumably the Huber sandwich estimator will always be used.
I was using cluster_col in the CoxPHFitter and saw in the docstring that the sandwich estimator automatically gets used. I was aiming to match the standard errors in a test case with a CoxTimeVarying model by setting robust to the same value in the CoxPHFitter and CoxTimeVarying. (This explains my test data below.) However, I saw from issue #544 that the CoxTimeVarying has not been implemented leaving me only the option to set robust=False in the CoxPHFitter model. For the test case, I can just leave cluster_col unspecified. I think an error or error message should be returned in the case of cluster_col being set and robust=False. It looks like this conditional needs to be edited.
Here's a reproducible example with my comments:
import numpy.testing as npt
import pandas as pd
from lifelines import CoxPHFitter, CoxTimeVaryingFitter
from lifelines.datasets import load_stanford_heart_transplants
from lifelines.utils import to_long_format
stanford = load_stanford_heart_transplants()
# Keep only the last record for each subject, drop all covariate columns except age to simplify data
stanford_last = (
stanford.groupby("id")
.tail(1)
.drop(["year", "surgery", "transplant"], axis="columns")
)
# Format the data for CPH model
stanford_last_cph_wid = stanford_last.rename(
columns={"start": "W", "stop": "T", "event": "E"}
)
stanford_last_cph_wid.head()
Create a CoxPHFitter model and fit it with the cluster_col specified.
TL;DR: When using
CoxPHFitter.fit()
, it doesn't matter whether a value forrobust
is specified. If there's acluster_col
specified, then presumably theHuber sandwich estimator
will always be used.I was using
cluster_col
in theCoxPHFitter
and saw in the docstring that the sandwich estimator automatically gets used. I was aiming to match the standard errors in a test case with aCoxTimeVarying
model by settingrobust
to the same value in theCoxPHFitter
andCoxTimeVarying
. (This explains my test data below.) However, I saw from issue #544 that theCoxTimeVarying
has not been implemented leaving me only the option to setrobust=False
in theCoxPHFitter
model. For the test case, I can just leavecluster_col
unspecified. I think an error or error message should be returned in the case ofcluster_col
being set androbust=False
. It looks like this conditional needs to be edited.Here's a reproducible example with my comments:
Create a
CoxPHFitter
model and fit it with thecluster_col
specified.However, if both a
cluster_col
androbust
was specified, the SE value is always the same (0.14374
) regardless of the value forrobust
.The standard error is different (
0.13862
) whencluster_col
is not specified, therefore lettingrobust
be set to its default value ofFalse
.lifelines version: 0.27.8