Closed zapaishchykova closed 1 year ago
HI @zapaishchykova DSM expects all passed times t
to be strictly greater than 0
. can you check that in your data?
Hi! Here it is:
times = np.quantile(outcomes.time[outcomes.event==1], [0.25, 0.5, 0.6]).tolist()
times
[13.0, 27.0, 35.0]
can you check if there are any zeros in outcomes.time
there are indeed some zeros in the outcomes.time ! Should I replace them with some small value instead?
aha! yes, either add a small non-zero factor 1e-4
or something like that, or just add a constant value to every time to change the scale to be strictly positive.
Some progress, now I get different error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In [48], line 18
16 roc_auc = []
17 for i, _ in enumerate(times):
---> 18 roc_auc.append(cumulative_dynamic_auc(et_train, et_test, out_risk[:, i], times[i])[0])
19 for horizon in enumerate(horizons):
20 print(f"For {horizon[1]} quantile,")
File ~/miniconda3/envs/pycox310/lib/python3.10/site-packages/sksurv/metrics.py:468, in cumulative_dynamic_auc(survival_train, survival_test, estimate, times, tied_tol)
466 cens = CensoringDistributionEstimator()
467 cens.fit(survival_train)
--> 468 ipcw = cens.predict_ipcw(survival_test)
470 # expand arrays to (n_samples, n_times) shape
471 test_time = numpy.broadcast_to(test_time[:, numpy.newaxis], (n_samples, n_times))
File ~/miniconda3/envs/pycox310/lib/python3.10/site-packages/sksurv/nonparametric.py:448, in CensoringDistributionEstimator.predict_ipcw(self, y)
445 Ghat = self.predict_proba(time[event])
447 if (Ghat == 0.0).any():
--> 448 raise ValueError("censoring survival function is zero at one or more time points")
450 weights = numpy.zeros(time.shape[0])
451 weights[event] = 1.0 / Ghat
ValueError: censoring survival function is zero at one or more time points
Are you trying to perform Cross Validation? is this a relatively small dataset?
It is a small dataset! Interestingly, with this dataset using scikit-survival I was also unable to compute ROC with similar looking error
yeah we use scikit-survival for the underlying metrics computation. Try shuffling your CV folds ? It might help, essentially computing performance for the models requires the same range of times to be present in the training and testing folds, in the case of your dataset the smaller size leads to a fold having times beyond what it has seen in the training set
aha, maybe then stratified creation of the folds will make more sense for such small set. Closing this for now, thanks a lot!
Hello! Thanks for such unique package. I am trying to use DeepSurvivalMachines (note: for example, on the same dataset DeepCoxMixtures work without any issues), here is the error log:
Some more details:
outcomes = pd.DataFrame() outcomes['event'] = pd.DataFrame(data_y)['Status'].astype('int64') outcomes['time'] = pd.DataFrame(data_y)['Survival_in_days'].astype('int64')
features_val = df_features_val.copy().astype('float64') outcomes_val = pd.DataFrame() outcomes_val['event'] = pd.DataFrame(data_y_val)['Status'].astype('int64') outcomes_val['time'] = pd.DataFrame(data_y_val)['Survival_in_days'].astype('int64')
from auton_survival.models.dsm import DeepSurvivalMachines from sklearn.model_selection import ParameterGrid
param_grid = {'k' : [3, 4, 6], 'distribution' : ['LogNormal', 'Weibull'], 'learning_rate' : [ 1e-4, 1e-3], 'layers' : [ [], [100], [100, 100] ] } params = ParameterGrid(param_grid)
models = [] for param in params: model = DeepSurvivalMachines(k = param['k'], distribution = param['distribution'], layers = param['layers'])
best_model = min(models) model = best_model[0][1]
cis = [] brs = []
et_train = np.array([(e_train[i], t_train[i]) for i in range(len(e_train))], dtype = [('e', bool), ('t', float)]) et_test = np.array([(e_test[i], t_test[i]) for i in range(len(e_test))], dtype = [('e', bool), ('t', float)]) et_val = np.array([(e_val[i], t_val[i]) for i in range(len(eval))], dtype = [('e', bool), ('t', float)]) times = np.quantile(outcomes.time[outcomes.event==1], [0.25, 0.5, 0.6]).tolist() for i, in enumerate(times): cis.append(concordance_index_ipcw(et_train, et_test, out_risk[:, i], times[i])[0])
[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]