facebookresearch / Neural_3D_Video

The repository for CVPR 2022 Paper "Neural 3D Video Synthesis"
Other
258 stars 9 forks source link

Quantitative results in Tab. 1 from the main paper #23

Closed tobias-kirschstein closed 1 year ago

tobias-kirschstein commented 1 year ago

Am I right to assume that the numbers in Table 1 from the main paper were obtained by training DyNeRF on more scenes than the 6 available in this GitHub repo? If so, it is basically impossible to compare against the original DyNeRF as neither code nor the data from the quantitative comparison are available.

My doubts stem from the fact that DyNeRF shows qualitative results on two additional scenes with multiple moving people. Furthermore, in the appendix it says that the model was trained on:

Lastly, the LPIPS scores seem very low to me. How were these scores computed for the paper? If torchmetrics implementation was used, there is a common mistake to not use normalize=True which artificially lowers the computed LPIPS scores (See for example: https://github.com/nerfstudio-project/nerfstudio/issues/1424).

It would be great if the authors could clarify these points.

grafik

zhaoyang-lv commented 1 year ago

No. The training and evaluation is done on the DyNeRF salmon subset (10-second sequence). The table here says "10-second sequence". I am not sure where we confused you. The evaluation is done on a hold-out center camera view (camera 0).

zhaoyang-lv commented 1 year ago

We did have results on additional sequences in our study. Due to overhead in assets open-source, we did not get to the state to release those. But we did all the ablations and comparisons on the released subset.

zhaoyang-lv commented 1 year ago

For LPIPS, we implemented our own, not using the pytorch implementation. All numbers in the table are calculated using the same evaluation script.

tobias-kirschstein commented 1 year ago

Thanks for coming back to me so quickly!

The training and evaluation is done on the DyNeRF salmon subset (10-second sequence)

Which of the 6 released sequences is the DyNeRF salmon subset? grafik flame_salmon_1 is 40 seconds long


But we did all the ablations and comparisons on the released subset

So, the results in Table 1 were done on exactly the 6 sequences released in this repository?


For LPIPS, we implemented our own, not using the pytorch implementation

Would it be possible to share some details of this evaluation script?

zhaoyang-lv commented 1 year ago

Which of the 6 released sequences is the DyNeRF salmon subset?

The flame_salmon_1 subset is. It should be the first 10 seconds. You can confirm comparing to the figure used in the ablation study.

So, the results in Table 1 were done on exactly the 6 sequences released in this repository?

To clarify, as said in the table caption, it is calculated only using that 10-second snippet in the repository.

Would it be possible to share some details of this evaluation script?

Unfortunately no. :( Our code is heavily dependent on our internal code repo which restricts us from sharing it outside.

tobias-kirschstein commented 1 year ago

The flame_salmon_1 subset is. It should be the first 10 seconds

Got it! Thanks for clarifying this. I think this should be highlighted somewhere. I have seen the numbers from table 1 been quoted in other studies (e.g., Nerfplayer, HyperReel) but side-by-side with numbers that where computed on all of the 6 sequences from this repository, which of course is wrong since those numbers are not comparable.

Unfortunately no. :( Our code is heavily dependent on our internal code repo which restricts us from sharing it outside.

I understand. For us, it would be enough to know which pretrained image encoder was used (AlexNet or VGG) and whether you fed the images as tensors with range [0,1] or [-1,1] to the image encoder.

zhaoyang-lv commented 1 year ago

I think this should be highlighted somewhere.

I will make an update soon in the readme to highlight this. Hopefully later this week when I got time. Thanks for the suggestion. :)

which pretrained image encoder was used (AlexNet or VGG)

I can confirm we use Alexnet for this. :)