chnsh / DCRNN_PyTorch

Diffusion Convolutional Recurrent Neural Network Implementation in PyTorch
MIT License
440 stars 111 forks source link

Using no convolution better than DCRNN paper result? #7

Closed Noahprog closed 4 years ago

Noahprog commented 4 years ago

Hi @chnsh,

Hope you're doing well. While testing your pytorch implementation of the DCRNN code, I stumbled across a weird result. When turning off convolution with max_diffusion_step=0 the result was greatly improved over the original dcrnn paper result.

I tested this on the PeMS dataset, configurations are:

Screenshot 2020-06-03 at 16 35 01

link to file: model_config_issue.txt. In short, I simplified the model a bit by doing the 15min forecast, using 'laplacian' filter, no curriculum learning and only 60 epochs. Of course also _max_diffusion_step=0_ is used to discard using a convolution.

This resulted in val_mae: 1.2388 at 60th epoch as can be seen in the snippet or full info.log. Screenshot 2020-06-03 at 16 29 53 This result is better than the full blown published DCRNN which reported val_mae: 1.38. The fact that even a simpler model without convolution is better than the original DCRNN should raise concern about the soundness of this implementation. This is kind of the same problem as issue #3.

I'm not familiar with tensorflow, that is why your implementation has given me much help with my thesis. Because this observation could probably bottleneck me down the road I would like to pin down the reason for this behaviour as early as possible. I think you have more insights in the workings of the original tensorflow implementation of DCRNN, thus I would like to ask you to have another look in finding the problem. I have a gut feeling the problem lies somewhere in the calculation of the error/loss.

Hope you can find the time to look into this issue. Thanks in advance.

Noahprog commented 4 years ago

For the answer of this issue I would like to refer to issue #3