Open saurabhdash opened 3 years ago
will you do the pull request?
I agree with you. These calculations impact training as well because the RMSE loss is one of the loss functions being minimized overall. This raises serious doubts over the code and the reproducibility of the paper.
A simple fix to this could be
true = event_time[:, 1:] - event_time[:, :-1]
true = torch.where(true>=0., true, torch.zeros(true.shape).to(torch.device("cuda")))
There is an error in the time_loss computation:
event_time[:, 1:] - event_time[:, :-1] causes an extra term to appear.
Example: If the sequence t has 3 events but gets 0 padded to have a length 5. t = [1, 3, 4, 0, 0] delta should be t = [2, 1, 0, 0]
but performing event_time[:, 1:] - event_time[:, :-1] leads to [2, 1, -4, 0].
Solution is to pass time_gap to this function instead of subtracting event_time. Alternatively, this difference can be multiplied by non_pad_mask to remove this artifact term.