Open rvandewater opened 9 months ago
Hi @rvandewater, thanks for contributing to auton-survival 🙂
Given a DeepCoxPH
model trained on a survival dataset X_train, Y_train ~ features, (events, times)
the min and max admissible times to compute the survival_regression_metric
are, as you noted:
min_time = min(Y_train.times.values) + 1
max_time = max(Y_train.times.values) - 1
To avoid this problem you have three options:
max_time
to your times
sksurv.metrics.concordance_index_censored
:from sksurv import metrics
from auton_survival import DeepCoxPH
import torch
model = DeepCoxPH()
# ... train model ...
# Use model.torch_model[0] to access the `torch.nn.Module` that computes risk scores for DeepCox
# A better (and retro-compatible) API to access the PyTorch module will be available in the next updates
with torch.inference_mode():
model.torch_model[0].eval()
X_test, Y_test = get_test_data()
risk_scores = model.torch_model[0](X_test)
concordance_index_censored = metrics.concordance_index_censored(
Y_test.events.values.astype(bool),
Y_test.times.values,
risk_scores.squeeze(),
)
I'm not sure if this satisfies your question, let me know if you need anything else
NB: I'm copying your code with syntax highlighting so it's easier to read (you can enable it by writing "```python" instead of " ```" at the start of the code block):
nonnumeric_cols = [col for (col, dtype) in df.dtypes.iteritems() if dtype.name == "category" or dtype.kind not in "biuf"] --------------------------------------------------------------------------- AssertionError Traceback (most recent call last) Cell In[44], line 22 20 # Obtain survival probabilities for validation set and compute the Integrated Brier Score 21 predictions_val = model.predict_survival(x_val, times) ---> 22 metric_val = survival_regression_metric('ibs', y_val, predictions_val, times, y_tr) 23 models.append([metric_val, model]) 25 # Select the best model based on the mean metric value computed for the validation set File ~/projects/auton-survival/auton_survival/metrics.py:215, in survival_regression_metric(metric, outcomes, predictions, times, outcomes_train, n_bootstrap, random_seed) 211 outcomes_train = outcomes 212 warnings.warn("You are are evaluating model performance on the \ 213 same data used to estimate the censoring distribution.") --> 215 assert max(times) < outcomes_train.time.max(), "Times should \ 216 be within the range of event times to avoid exterpolation." 217 assert max(times) <= outcomes.time.max(), "Times \ 218 must be within the range of event times." 220 survival_train = util.Surv.from_dataframe('event', 'time', outcomes_train) AssertionError: Times should be within the range of event times to avoid exterpolation.
from auton_survival.estimators import SurvivalModel from auton_survival.metrics import survival_regression_metric from sklearn.model_selection import ParameterGrid # Define parameters for tuning the model param_grid = {'l2' : [1e-3, 1e-4]} params = ParameterGrid(param_grid) # Define the times for model evaluation times = np.quantile(y_tr['time'][y_tr['event']==1], np.linspace(0.1, 1, 10)).tolist() # Perform hyperparameter tuning models = [] for param in params: model = SurvivalModel('cph', random_seed=2, l2=param['l2']) # The fit method is called to train the model model.fit(x_tr, y_tr) # Obtain survival probabilities for validation set and compute the Integrated Brier Score predictions_val = model.predict_survival(x_val, times) metric_val = survival_regression_metric('ibs', y_val, predictions_val, times, y_tr) models.append([metric_val, model]) # Select the best model based on the mean metric value computed for the validation set metric_vals = [i[0] for i in models] first_min_idx = metric_vals.index(min(metric_vals)) model = models[first_min_idx][1]
Hi @matteo4diani, thanks for your answer. I believe the manual cutting-off that you suggested was not even needed, but I replaced this line:
times = np.quantile(y_tr['time'][y_tr['event']==1], np.linspace(0.1, 1, 10)).tolist()
With this line:
times = np.quantile(y_val['time'][y_val['event']==1], np.linspace(0.1, 1, 10)).tolist()
The training data quantiles are validated within the code. I am not sure if this is intended like this as according to https://autonlab.org/auton-survival/metrics.html this should probably be based on the validation or test set and not the training set:
times : np.array The time points at which to compute metric value(s)
Hi,
Thank you for creating this package.
I am encountering an error when using my own dataset for creating a survival regression model (see below). I am using the
Survival Regression with Auton-Survival
notebook with the cox proportional hazards model (see code below error). I am using a preprocessed dataset extracted from eICU with the max time value 168 for train, test, and val.What I tried: when I try to replace the 168 in validation to 167 it gives me the same error. I checked the original example, and there seems to be the same situation that the max value in validation is equal to the same value in training; however, it does not throw an error here.
Thank you for your help.