Can‘t reproduced the experiment, low PSNR in Lego dataset!

Saoyu99 commented 2 years ago

with the default command "-O", in a single RTX1080, train 30K steps, 100 images, 300 epoch, Lego dataset. the speed is fast, but I only get PSNR=28.244854, LPIPS = 0.070723. Is there any mistake? please give me some advice.

ashawkey commented 2 years ago

@Saoyu99 This is not expected. Could you provide an example of the training results? (e.g., generated image and depth).

Saoyu99 commented 2 years ago

@ashawkey here is the video results i get after 300 epoch train. The overall look is OK, but the partial details are not great。 https://user-images.githubusercontent.com/53813987/194811456-6c71b620-349d-4fd5-9697-fc2fe975a454.mp4 https://user-images.githubusercontent.com/53813987/194811532-0ea75255-f089-436b-8a63-dc52e8a8c870.mp4

Saoyu99 commented 2 years ago

and when I keep training, the result get better. here is the video after train 6750 epoch, the bucket looks better, PSNR = 29.073251 ngp_ep6750_rgb.mp4.zip

ashawkey commented 2 years ago

I cannot reproduce this with the latest commit, maybe you could check the CUDA version, torch version, etc. ? My local results on LEGO:

==> Finished Epoch 300.
++> Evaluate at epoch 300 ...
loss=0.0003 (0.0003): : 100% 100/100 [00:16<00:00,  6.13it/s]
PSNR = 35.224799
LPIPS (alex) = 0.010514
++> Evaluate epoch 300 Finished.
[INFO] New best result: 0.0003346026000508573 --> 0.0003220700420206413
Loading test data: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:07<00:00, 27.01it/s]
++> Evaluate at epoch 300 ...
loss=0.0003 (0.0004): : 100% 200/200 [00:30<00:00,  6.54it/s]
PSNR = 34.113840
LPIPS (alex) = 0.013175
++> Evaluate epoch 300 Finished.
==> Start Test, save results to trial_nerf_lego/results

Saoyu99 commented 2 years ago

thanks for your advice! I'm gonna go check the environment.

letian-zhang commented 2 years ago

I also use the Nvidia GTX 1080 TI to train the lego dataset with the default command "-O" 30K steps, 100 images, 300 epoch, Lego dataset. I get the similar PSNR as @Saoyu99. My environment is Windows 10, cuda=11.3, pytorch=1.11.0. The only thing I am concerned is the that 1080 Ti GPU is architecture 61. And FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.

Saoyu99 commented 2 years ago

@letian-zhang I tried a lot environments, but the results were the same. I ran it in instant-ngp with the same dataset and it worked as expected, so it wasn't the data either. I found torch-ngp to be even faster than instant-ngp by default setting, so the performance drop perhaps because of some tricks or the code itself.

letian-zhang commented 2 years ago

@Saoyu99 The interesting thing is that when I run troch-ngp in Pychram, I got PSNR 34.56. But in conda environment, it is only PSNR 28.98.

Saoyu99 commented 2 years ago

@letian-zhang what you mean “in Pycharm”？use the local environment?

bhiaibogf commented 2 years ago

I also got the same result like @Saoyu99. And I found that there are some patten in my result. In the bottom picture, it seems like to have some horizontal lines.

bhiaibogf commented 2 years ago

Found solution. I forgot to add --dt_gamma 0 arg. It runs well now with adaptive ray marching off.

Saoyu99 commented 2 years ago

@bhiaibogf thanks bro! it worked. I didn't notice the author's hint " for the blender dataset, you should add `--bound 1.0 --scale 0.8 --dt_gamma 0"

ashawkey / torch-ngp

Can‘t reproduced the experiment, low PSNR in Lego dataset! #117