Hope you're doing well. While testing your pytorch implementation of the DCRNN code, I stumbled across a weird result. When turning off convolution with max_diffusion_step=0 the result was greatly improved over the original dcrnn paper result.
I tested this on the PeMS dataset, configurations are:
link to file: model_config_issue.txt. In short, I simplified the model a bit by doing the 15min forecast, using 'laplacian' filter, no curriculum learning and only 60 epochs. Of course also _max_diffusion_step=0_ is used to discard using a convolution.
This resulted in val_mae: 1.2388 at 60th epoch as can be seen in the snippet or full info.log. This result is better than the full blown published DCRNN which reported val_mae: 1.38. The fact that even a simpler model without convolution is better than the original DCRNN should raise concern about the soundness of this implementation. This is kind of the same problem as issue #3.
I'm not familiar with tensorflow, that is why your implementation has given me much help with my thesis. Because this observation could probably bottleneck me down the road I would like to pin down the reason for this behaviour as early as possible. I think you have more insights in the workings of the original tensorflow implementation of DCRNN, thus I would like to ask you to have another look in finding the problem. I have a gut feeling the problem lies somewhere in the calculation of the error/loss.
Hope you can find the time to look into this issue. Thanks in advance.
Hi @chnsh,
Hope you're doing well. While testing your pytorch implementation of the DCRNN code, I stumbled across a weird result. When turning off convolution with
max_diffusion_step=0
the result was greatly improved over the original dcrnn paper result.I tested this on the PeMS dataset, configurations are:
link to file: model_config_issue.txt. In short, I simplified the model a bit by doing the 15min forecast, using 'laplacian' filter, no curriculum learning and only 60 epochs. Of course also _
max_diffusion_step=0
_ is used to discard using a convolution.This resulted in
val_mae: 1.2388
at 60th epoch as can be seen in the snippet or full info.log. This result is better than the full blown published DCRNN which reportedval_mae: 1.38
. The fact that even a simpler model without convolution is better than the original DCRNN should raise concern about the soundness of this implementation. This is kind of the same problem as issue #3.I'm not familiar with tensorflow, that is why your implementation has given me much help with my thesis. Because this observation could probably bottleneck me down the road I would like to pin down the reason for this behaviour as early as possible. I think you have more insights in the workings of the original tensorflow implementation of DCRNN, thus I would like to ask you to have another look in finding the problem. I have a gut feeling the problem lies somewhere in the calculation of the error/loss.
Hope you can find the time to look into this issue. Thanks in advance.