Reimplementation details

QiZhao-NJU / Neural-Representation-for-Video-via-Differential-Input-and-Pyramidal-Architecture

Neural Representation for Video via Differential Input and Pyramidal Architecture

MIT License

3 stars 0 forks source link

Reimplementation details #1

Open Xinjie-Q opened 10 months ago

Xinjie-Q commented 10 months ago

Hello! Great Work! Could you release the related training code for DNeRV? I have tried to use E-NeRV code to train DNeRV with UVG dataset. However, the reimplementation results differ significantly from those reported in the paper. For example, with 3M decoder, 2x40x80 difference embedding and L1+SSIM loss, the reimplemented beauty result was 33.83dB, while the reported beauty result was 40.00dB. The performance gap is so large, thus I sincerely hope you can provide more training details to help me get correct results.
Thanks a lot and looking forward to your reply!

QiZhao-NJU commented 10 months ago

Thanks for your attention and comments to our work! It should be noted that, all the NeRV methods would encounter the problem of training failure, where PSNR does not continuously rise, but suddenly drops in a certain training epoch. Whenever this situation occurs, the training efficiency of nerv will be affected. To solve this problem, we can replace the random seed. Also, the kernel size in CCU is changed as 3x3 for better performance.

Aaronbtb commented 10 months ago

Thank you for your work. I reviewed your article and found a similar problem. I used your latest code and the training was set to random seed. With 3M decoder, 2x40x80 difference embedding and L2 loss, the reimplemented beauty result was 33.91dB. However, this is still far from the 40dB target. I would like to ask you for your advice on this model training problem. Thank you very much for your help.