Closed Saoyu99 closed 2 years ago
@Saoyu99 This is not expected. Could you provide an example of the training results? (e.g., generated image and depth).
@ashawkey here is the video results i get after 300 epoch train. The overall look is OK, but the partial details are not great。 https://user-images.githubusercontent.com/53813987/194811456-6c71b620-349d-4fd5-9697-fc2fe975a454.mp4 https://user-images.githubusercontent.com/53813987/194811532-0ea75255-f089-436b-8a63-dc52e8a8c870.mp4
and when I keep training, the result get better. here is the video after train 6750 epoch, the bucket looks better, PSNR = 29.073251 ngp_ep6750_rgb.mp4.zip
I cannot reproduce this with the latest commit, maybe you could check the CUDA version, torch version, etc. ? My local results on LEGO:
==> Finished Epoch 300.
++> Evaluate at epoch 300 ...
loss=0.0003 (0.0003): : 100% 100/100 [00:16<00:00, 6.13it/s]
PSNR = 35.224799
LPIPS (alex) = 0.010514
++> Evaluate epoch 300 Finished.
[INFO] New best result: 0.0003346026000508573 --> 0.0003220700420206413
Loading test data: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:07<00:00, 27.01it/s]
++> Evaluate at epoch 300 ...
loss=0.0003 (0.0004): : 100% 200/200 [00:30<00:00, 6.54it/s]
PSNR = 34.113840
LPIPS (alex) = 0.013175
++> Evaluate epoch 300 Finished.
==> Start Test, save results to trial_nerf_lego/results
thanks for your advice! I'm gonna go check the environment.
I also use the Nvidia GTX 1080 TI to train the lego dataset with the default command "-O" 30K steps, 100 images, 300 epoch, Lego dataset. I get the similar PSNR as @Saoyu99. My environment is Windows 10, cuda=11.3, pytorch=1.11.0. The only thing I am concerned is the that 1080 Ti GPU is architecture 61. And FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
@letian-zhang I tried a lot environments, but the results were the same. I ran it in instant-ngp with the same dataset and it worked as expected, so it wasn't the data either. I found torch-ngp to be even faster than instant-ngp by default setting, so the performance drop perhaps because of some tricks or the code itself.
@Saoyu99 The interesting thing is that when I run troch-ngp in Pychram, I got PSNR 34.56. But in conda environment, it is only PSNR 28.98.
@letian-zhang what you mean “in Pycharm”?use the local environment?
I also got the same result like @Saoyu99. And I found that there are some patten in my result. In the bottom picture, it seems like to have some horizontal lines.
Found solution. I forgot to add --dt_gamma 0 arg. It runs well now with adaptive ray marching off.
@bhiaibogf thanks bro! it worked. I didn't notice the author's hint " for the blender dataset, you should add `--bound 1.0 --scale 0.8 --dt_gamma 0"
with the default command "-O", in a single RTX1080, train 30K steps, 100 images, 300 epoch, Lego dataset. the speed is fast, but I only get PSNR=28.244854, LPIPS = 0.070723. Is there any mistake? please give me some advice.