First of all, the idea of this paper is really interesting and empirical results are very impressive.
However, it seems that the implementation of function evaluate(model, mode) might be problematic when _ytrue is full of zeros. The details are as follows @deepkashiwa20 :
def masked_mae_loss(y_pred, y_true):
mask = (y_true != 0).float()
mask /= mask.mean()
loss = torch.abs(y_pred - y_true)
loss = loss * mask
# trick for nans: https://discuss.pytorch.org/t/how-to-set-nan-in-tensor-to-0/3918/3
loss[loss != loss] = 0
return loss.mean()
def masked_mape_loss(y_pred, y_true):
mask = (y_true != 0).float()
mask /= mask.mean()
loss = torch.abs(torch.div(y_true - y_pred, y_true))
loss = loss * mask
# trick for nans: https://discuss.pytorch.org/t/how-to-set-nan-in-tensor-to-0/3918/3
loss[loss != loss] = 0
return loss.mean()
In such case, masked_mae_loss returns 0, then the lists l3,l6,l12 would take this ambiguous ZERO as the model perfectly make the predictions.
for x, y in data_iter:
x, y, ycov = prepare_x_y(x, y)
output, h_att, query, pos, neg = model(x, ycov)
y_pred = scaler.inverse_transform(output)
y_true = scaler.inverse_transform(y)
loss1 = masked_mae_loss(y_pred, y_true) # masked_mae_loss(y_pred, y_true)
separate_loss = nn.TripletMarginLoss(margin=1.0)
compact_loss = nn.MSELoss()
loss2 = separate_loss(query, pos.detach(), neg.detach())
loss3 = compact_loss(query, pos.detach())
loss = loss1 + args.lamb * loss2 + args.lamb1 * loss3
losses.append(loss.item())
# Followed the DCRNN TensorFlow Implementation
maes.append(masked_mae_loss(y_pred, y_true).item())
mapes.append(masked_mape_loss(y_pred, y_true).item())
mses.append(masked_mse_loss(y_pred, y_true).item())
# Important for MegaCRN model to let T come first.
y_true, y_pred = y_true.permute(1, 0, 2, 3), y_pred.permute(1, 0, 2, 3)
l_3.append(masked_mae_loss(y_pred[2:3], y_true[2:3]).item())
m_3.append(masked_mape_loss(y_pred[2:3], y_true[2:3]).item())
r_3.append(masked_mse_loss(y_pred[2:3], y_true[2:3]).item())
l_6.append(masked_mae_loss(y_pred[5:6], y_true[5:6]).item())
m_6.append(masked_mape_loss(y_pred[5:6], y_true[5:6]).item())
r_6.append(masked_mse_loss(y_pred[5:6], y_true[5:6]).item())
l_12.append(masked_mae_loss(y_pred[11:12], y_true[11:12]).item())
m_12.append(masked_mape_loss(y_pred[11:12], y_true[11:12]).item())
r_12.append(masked_mse_loss(y_pred[11:12], y_true[11:12]).item())
ys_true.append(y_true)
ys_pred.append(y_pred)
Unfortunately, for the META-LA dataset there exist several batches of _ytrue that are all zeros. This will lead to MAE/RMSE/MAPE that are underestimated.
Dear Authors,
First of all, the idea of this paper is really interesting and empirical results are very impressive.
However, it seems that the implementation of function evaluate(model, mode) might be problematic when _ytrue is full of zeros. The details are as follows @deepkashiwa20 :
In such case, masked_mae_loss returns 0, then the lists l3,l6,l12 would take this ambiguous ZERO as the model perfectly make the predictions.
Unfortunately, for the META-LA dataset there exist several batches of _ytrue that are all zeros. This will lead to MAE/RMSE/MAPE that are underestimated.