Currently, the data generator simply selects a random sample of observation series when creating batches. It would be better to sample more in line with the distribution by sequence length and whether data has been censored.
The main reason this is important is that the loss2 function must compare each observation's predicted risk to other observations' predicted risk values in order to calculate the loss.
There's currently a weighting being applied to samples to help even out the distribution. I'd like to resolve the other issues first and then see if this implementation makes a difference
Currently, the data generator simply selects a random sample of observation series when creating batches. It would be better to sample more in line with the distribution by sequence length and whether data has been censored.
The main reason this is important is that the
loss2
function must compare each observation's predicted risk to other observations' predicted risk values in order to calculate the loss.