Stratify the sampling in the data generator

data-modeler / rnn-surv

Replication of the RNN-SURV architecture

MIT License

14 stars 5 forks source link

Stratify the sampling in the data generator #3

Open data-modeler opened 4 years ago

data-modeler commented 4 years ago

Currently, the data generator simply selects a random sample of observation series when creating batches. It would be better to sample more in line with the distribution by sequence length and whether data has been censored.

The main reason this is important is that the loss2 function must compare each observation's predicted risk to other observations' predicted risk values in order to calculate the loss.

data-modeler commented 4 years ago

There's currently a weighting being applied to samples to help even out the distribution. I'd like to resolve the other issues first and then see if this implementation makes a difference