Closed MilkTea-halfsugar closed 2 months ago
Thanks for your interest in our work. However, it seems there might be some issues in your experimental setup. FreeNeRF and other baselines we studied (mip-NeRF, RegNeRF) consistently concatenate XYZ coordinates with the original positional encoding throughout the paper. This can be observed in many instances in our paper:
These results strongly indicate that masking is a key enabling factor in FreeNeRF’s performance. If masking does not contribute to the improvement while concatenation works “out of the box” as you suggest, then how do we explain the results seen in Figures 2 and 7?
Additionally, all the results presented in the last row groups of Tables 2 and 3 were produced using the same approach with concatenation. Please refer to the captions where it is explicitly mentioned: “concat.”: inputs concatenation (Eq. (2))
. We noted that this step slightly improves mip-NeRF on the LLFF dataset, but it doesn't help other settings during our test.
I would strongly recommend you to carefully review the implementation and experimental setup, before making a serious accusation, to ensure that there are no discrepancies in reproduction.
Thanks for your great work. However, I find that the performance improvement does not come from your novel "frequency mask". You set positional encoding to be "mask+ concatenation of xyz", while I set the mask to be all 1 all other NeRF still get a huge improvement (like RegNeRF).
In other paper, they do not use concatenation of xyz to positional encoding, while you do. And you claim mainly for mask. In that case, I think your claim somewhat is wrong in the paper.