Performance inconsistency with wandb visualizations when loading intermediate checkpoints

GuangyuWang99 commented 1 year ago

Hi, Thanks for sharing the wonderful work! Here I come across a problem when loading the intermediate checkpoints (i.e., the checkpoints saved every 20000 iters instead of the final checkpoint). For example, when training for 140000 iters, the val/vis results on wandb looks reasonably well:

However, when I load the saved checkpoints at 140000 iters and test with BaseTrainer.test() in projects/nerf/trainers/base.py, the results are given as:

The same result is obtained when extracting the mesh using the provided scripts in projects/neuralangelo/scripts/extract_mesh.py (and also the saved checkpoints at 140000 iters):

This confuses me a lot since the evaluation code is exactly the same as that used for wandb visualizations. It seems like the geometry (SDF field) 'dilates' compared to the result shown in wandb.

Similar phenomenons are also witnessed on other scenes like M60 from TanksandTemples/intermediate, where the intermediate checkpoints saved at 160000 iters (when the total iter is 200000) is inconsistent with wandb visualizations (The reconstructed mesh / rendered normal is 'fatter'). However, testing with checkpoints saved at 180000 or 200000 looks consistent with wandb visualizations.

chenhsuanlin commented 1 year ago

Hi @GuangyuWang99, which commit are you at in the code? If you're not at the latest main, does the issue persist if you pull and rerun the mesh extraction?

GuangyuWang99 commented 1 year ago

Hi @chenhsuanlin, thanks for your prompt reply. Actually I am using the latest main branch and the issue persists.

I re-run the experiment on Family scene using the latest main, and the extracted mesh / rendered normal at 100000 iteration looks like:

while the normal visualization in wandb at the same iteration is:

GuangyuWang99 commented 1 year ago

Another thing to be noticed is that this phenomenon may be related to the choices of batch_size (or single / multiple GPUs) in training.

The above experiments are performed with batch_size = 10 using 6 GPUs. However, when I train with batch_size = 1 on a single GPU and leave other configs strictly unchanged, the extracted mesh / rendered normals using intermediate checkpoints are consistent with the visualizations on wandb. I'm not quite sure whether this phenomenon indicates certain relation towards the batch_size (or single / multiple GPUs) option, or just arises from some randomness.

However, another issue appears when training with batch_size = 1 on a single GPU for a total iteration of 500000. The rendered color using intermediate, saved checkpoints is totally wrong after 120000 iterations, e.g.: Rendering using saved ckpt at 140000 iterations:

Rendering visualizations on wandb at 140000 iterations:

Rendering using saved ckpt at 120000 iterations:

Rendering visualizations on wandb at 120000 iterations:

mli0603 commented 1 year ago

Hi @GuangyuWang99

Thanks for reporting. The information is very helpful. This looks like there is something fishy about the trainer.test().

Could you confirm that the last saved checkpoint performs as expected?

GuangyuWang99 commented 1 year ago

Hi @GuangyuWang99

Thanks for reporting. The information is very helpful. This looks like there is something fishy about the trainer.test().

Could you confirm that the last saved checkpoint performs as expected?

Hi @mli0603 Thanks for your reply.

The last saved checkpoint does not always perform as expected.

As mentioned above, for the exp. with batch_size = 1 (single GPU), the extracted mesh / rendered normals always perform as expected, while the rendered rgbs become gray everywhere using saved ckpt after 120000 iterations, including the last ckpt.

However, for the exp. with batch_size = 10 using 6 GPUs, the extracted mesh / rendered maps (including normals, rgbs, etc.) are all inconsistent with wandb visualizations, except for the near-last iterations. For example, if the total iteration is set to 200000, the code only performs consistent with wandb after 180000 iterations. It seems like the saved ckpt severly laggs behind the results on wandb, except for the very last iterations, where the performance becomes consistent in this case.

Again, I have not yet tested this phenomenon extensively. Hence I'm not quite sure whether this phenomenon indicates certain relation towards the batch_size (or single / multiple GPUs) option, or just arises from some randomness.

Thanks again for digging into this issue.

chenhsuanlin commented 1 year ago

I have been able to reproduce the issue. This has been marked as a bug and we will look into this.

mli0603 commented 1 year ago

Hi @GuangyuWang99

We have pushed a commit that potentially fixes the issue of resuming (https://github.com/NVlabs/neuralangelo/commit/c91af8d5098c858df8e8dfa35fba8666d314782b). Please let us know if you still run into the same problem.

NVlabs / neuralangelo

Performance inconsistency with wandb visualizations when loading intermediate checkpoints #75