Closed guiyihan closed 4 years ago
It seems that the loss function is not exactly the same.
In the data there are two features: traffic speed and traffic volume (I don't know in what order). It seems that in the Pytorch version the loss is computed on this two features whereas in the tf DCRNN it is computed using only predictions for the first feature (line 272, https://github.com/liyaguang/DCRNN/blob/master/model/dcrnn_supervisor.py).
Is that correct?
@cvignac that is a good catch - however, it seems like that may not be the issue, I think in the dataset, there is only 1 dimension and indexing on 0 as you pointed out does nothing to the values.
I've not been able to dig in deeper to figure out why the boost exists though.
@chnsh I'm not sure that's the case. The dataset has two dimensions, speed and time, the model should predict both since they are required by the decoder when predicting the next step, but the evaluation should only be on the first dimension (speed).
Oh, I see - thanks for pointing it out, I will try and re-evaluate as soon as possible, I will be out of office for some time though
@razvanc92 I think you're right that the dataset has 2 dimensions (when it's constructed), but what is happening is that the final loss is calculated by slicing for the 1st dimension - the final outputs are in sized (12, 6912, 207)
and the dataset is (12, 6912, 207, 2)
so you see that the final loss calculation is on the correct dimensions, so the loss value is not wrong - can you confirm?
I think this issue is still not sufficiently answered though.
I'm really curious about why the Pytorch version has better performance. I've been digging through the code quite some time, but haven't found anything suspicious. Will keep you up to date if I find something.
I know what the problem is.
In the original Tensorflow implementation, the test evaluation it used to calculate MAE for every timestep separately. In the paper only the MAE for the last timestep is reported.
The next configurations are used to show an example.
In the following log (ran with the original Tensorflow code), it shows the separate MAE's for all three timesteps. This Pytorch implementation does not do that, but shows only the average of those values.
Here, the validation MAE (noted with an (1)) is equal to the exact same run with this Pytorch implementation. Also the MAE of the last timestep (noted by an (2)) is the same as the reported MAE in the published paper.
Concluding, both implementations are equally good. But, the Pytorch implementation lacks this separation of calculating MAE for every timestep. @chnsh this mistake could have been prevented..
P.S. This is the same problem as in issue #7 probably.
@Noahprog thanks for digging! Good find, want to send a PR to fix it?
I think the answer is related to #16
Thanks for your great work. Compared to original tensorflow results, this pytorch version has better performance. Could you explain the reason of this boost?