Got wrong results - Githubissues

jjlinghu commented 7 months ago

Hi! Dear author, I follow the instructions in README.md to run evaluation part. I have downloaded the pretrained models and sub-datasets and saved to the checkpoints and datasets respectively, but I got wrong results. It seems that I missed something important. Do you have advice to deal with it? python -m src.main +experiment=acid checkpointing.load=checkpoints/acid.ckpt mode=test dataset/view_sampler=evaluation dataset.view_sampler.index_path=assets/evaluation_index_acid.json test.compute_scores=true

Saving outputs to /data5/ly/mmdet/mvsplat/outputs/2024-04-07/09-31-02. rm: cannot remove 'outputs/local': No such file or directory GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/8 Saving outputs to /data5/ly/mmdet/mvsplat/outputs/2024-04-07/09-31-02. rm: cannot remove 'outputs/local': No such file or directory Saving outputs to /data5/ly/mmdet/mvsplat/outputs/2024-04-07/09-31-02. rm: cannot remove 'outputs/local': No such file or directory Saving outputs to /data5/ly/mmdet/mvsplat/outputs/2024-04-07/09-31-02. rm: cannot remove 'outputs/local': No such file or directory Saving outputs to /data5/ly/mmdet/mvsplat/outputs/2024-04-07/09-31-02. Saving outputs to /data5/ly/mmdet/mvsplat/outputs/2024-04-07/09-31-02. rm: cannot remove 'outputs/local': No such file or directory rm: cannot remove 'outputs/local': No such file or directory Saving outputs to /data5/ly/mmdet/mvsplat/outputs/2024-04-07/09-31-02. rm: cannot remove 'outputs/local': No such file or directory Saving outputs to /data5/ly/mmdet/mvsplat/outputs/2024-04-07/09-31-02. rm: cannot remove 'outputs/local': No such file or directory Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth Initializing distributed: GLOBAL_RANK: 5, MEMBER: 6/8 Initializing distributed: GLOBAL_RANK: 4, MEMBER: 5/8 Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/8 Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/8 Initializing distributed: GLOBAL_RANK: 6, MEMBER: 7/8 Initializing distributed: GLOBAL_RANK: 7, MEMBER: 8/8 Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/8

distributed_backend=nccl All distributed processes registered. Starting with 8 processes

Restoring states from the checkpoint path at checkpoints/acid.ckpt LOCAL_RANK: 5 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] LOCAL_RANK: 4 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] LOCAL_RANK: 7 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] LOCAL_RANK: 6 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] LOCAL_RANK: 3 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] Loaded model weights from the checkpoint at checkpoints/acid.ckpt Testing DataLoader 0: 0%| | 0/16 [00:00<?, ?it/s]Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth Testing DataLoader 0: 6%|██████████▏ | 1/16 [00:04<01:04, 0.23it/s]Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth Testing DataLoader 0: 12%|████████████████████▍ | 2/16 [00:04<00:31, 0.45it/s]Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth Testing DataLoader 0: 31%|██████████████████████████████████████████████████▉ | 5/16 [00:04<00:10, 1.02it/s]Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth Testing DataLoader 0: 38%|█████████████████████████████████████████████████████████████▏ | 6/16 [00:05<00:08, 1.18it/s]Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth Testing DataLoader 0: 44%|███████████████████████████████████████████████████████████████████████▎ | 7/16 [00:05<00:06, 1.34it/s]Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth Testing DataLoader 0: 81%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 13/16 [00:06<00:01, 2.13it/s]psnr 5.61385152890132 ssim 0.0009663698885840579 lpips 0.7411693197030288 encoder: 8 calls, avg. 0.061416834592819214 seconds per call decoder: 24 calls, avg. 0.0018823047478993733 seconds per call Testing DataLoader 0: 81%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 13/16 [00:06<00:01, 2.12it/s] psnr 5.61385152890132 ssim 0.0009663698885840579 lpips 0.7411693197030288 encoder: 8 calls, avg. 0.06180933117866516 seconds per call decoder: 24 calls, avg. 0.0019558072090148926 seconds per call psnr 5.61385152890132 ssim 0.0009663698885840579 lpips 0.7411693197030288 encoder: 8 calls, avg. 0.06272295117378235 seconds per call decoder: 24 calls, avg. 0.002048651377360026 seconds per call psnr 5.61385152890132 ssim 0.0009663698885840579 lpips 0.7411693197030288 encoder: 8 calls, avg. 0.0654122531414032 seconds per call decoder: 24 calls, avg. 0.0019436180591583252 seconds per call psnr 5.61385152890132 ssim 0.0009663698885840579 lpips 0.7411693197030288 encoder: 8 calls, avg. 0.09683313965797424 seconds per call decoder: 24 calls, avg. 0.002124359210332235 seconds per call psnr 5.61385152890132 ssim 0.0009663698885840579 lpips 0.7411693197030288 psnr 5.61385152890132 ssim 0.0009663698885840579 encoder: 8 calls, avg. 0.09448182582855225 seconds per call decoder: 24 calls, avg. 0.0020943681399027505 seconds per call lpips 0.7411693197030288 encoder: 8 calls, avg. 0.10512921214103699 seconds per call decoder: 24 calls, avg. 0.0021263360977172847 seconds per call psnr 5.61385152890132 ssim 0.0009663698885840579 lpips 0.7411693197030288 encoder: 8 calls, avg. 0.10539361834526062 seconds per call decoder: 24 calls, avg. 0.002041985591252645 seconds per call

donydchen commented 7 months ago

Hi @jjlinghu, we have double checked on our own environment, and the ACID experiment (the one you tried) works as expected with the provided comments. The scores we obtained on the acid subset are:

psnr 27.826187720665565
ssim 0.8714594657604511
lpips 0.12284171380675755

One reason might be the multi-gpu issue. Our model can be trained with muliple GPUs, but for testing, we only use one single GPU. Perhaps you can try to run with

CUDA_VISIBLE_DEVICES=0 python -m src.main +experiment=acid checkpointing.load=checkpoints/acid.ckpt mode=test dataset/view_sampler=evaluation dataset.view_sampler.index_path=assets/evaluation_index_acid.json test.compute_scores=true

Hope it works.

jjlinghu commented 7 months ago

Hi, thanks for your check and reply! Unfortunately, it has the same results in metrics.

th=assets/evaluation_index_acid.json test.compute_scores=true
Saving outputs to /data5/ly/mmdet/mvsplat/outputs/2024-04-07/15-55-46.
rm: cannot remove 'outputs/local': No such file or directory
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Restoring states from the checkpoint path at checkpoints/acid.ckpt
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Loaded model weights from the checkpoint at checkpoints/acid.ckpt
Testing DataLoader 0:   0%|                                                                                                                                                                        | 0/16 [00:00<?, ?it/s]Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Testing DataLoader 0:  81%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                             | 13/16 [00:06<00:01,  1.86it/s]
psnr 5.61385152890132
ssim 0.0009663698885840579
lpips 0.7411693197030288
encoder: 8 calls, avg. 0.061195939779281616 seconds per call
decoder: 24 calls, avg. 0.0018489062786102295 seconds per cal

I also try the evaluation on re10k.

Saving outputs to /data5/ly/mmdet/mvsplat/outputs/2024-04-07/15-58-21.
rm: cannot remove 'outputs/local': No such file or directory
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Restoring states from the checkpoint path at checkpoints/re10k.ckpt
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Loaded model weights from the checkpoint at checkpoints/re10k.ckpt
Testing DataLoader 0:   0%|                                                                                                                                                                        | 0/41 [00:00<?, ?it/s]Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Testing DataLoader 0:  93%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎           | 38/41 [00:10<00:00,  3.72it/s]
psnr 4.89181019130506
ssim 0.005856961852068228
lpips 0.7431378646900779
encoder: 33 calls, avg. 0.07152891159057617 seconds per call
decoder: 99 calls, avg. 0.0020032001264167552 seconds per call

After I download the sub-datesets from here, do I also need to convert them based on scripts. I currently don't have plans to train the full versions and I'm just debugging at this time. Apologize for disturbing you.

donydchen commented 7 months ago

No worries, @jjlinghu. The subset has already been converted and is not required to be processed with the script again. And from the log, I can see that you have successfully loaded the pre-trained weight; ideally, it should work out fine. I am not very sure about the exact reasons, but some things you can check for debugging are:

Pull the latest update from this project; make sure that you have not changed anything unintentionally.
Check the rendered visual results under folder outputs/test/, which might help you understand the situation better.
Save the input views; double check that the inputs are correctly loaded, for example, you can insert the following snippets right after https://github.com/donydchen/mvsplat/blob/main/src/model/model_wrapper.py#L206

for c_idx in range(batch['context']['image'].shape[1]):
    save_image(batch['context']['image'][0, c_idx], 
            path / scene / f"color/input{c_idx}_{batch['context']['index'][0, c_idx]:0>6}.png")

Log the warning messages. You can do this by commenting out https://github.com/donydchen/mvsplat/blob/main/src/main.py#L151. Sometimes warning logs might also provide valuable hints regarding the error.

Above are the thoughts coming out of my mind at the moment. Hope they help.

jjlinghu commented 7 months ago

Hi @donydchen. Thank you very much for your patient guidance! I get the correct results after pulling the latest update and building the diff-gaussian-rasterization-modified again. It seems that the decoder component is not functioning properly due to the previous wrong diff-gaussian-rasterization-modified building, and I got the all-black image under the folder outputs/test/. Now, I obtain the ideal results in the acidexperiment :

psnr 27.82588870708759
ssim 0.8714552934353168
lpips 0.12284397419828635

Thanks for this outstanding work and nice authors!

yutmdfeng commented 2 months ago

@jjlinghu I encountered the same problem as you (on re10k, the progress bar can only run to 38/41, and the image is all black), but after I re-pulled the diff-gaussian-rasterization-modified, the problem still failed.

I would like to ask if you only need to run pip install for the compilation operation of FF-Gaussian-Rasterization-modified. To execute setup.py, I don't know if I need to compile again with cmake. In addition, in the diff-gaussian-rasterization modified/third_party/glm, its content is empty. I pulled out the glm content again, but I encountered some problems during compilation. Could you please analyze the construction process

yutmdfeng commented 2 months ago

@jjlinghu I encountered the same problem as you (on re10k, the progress bar can only run to 38/41, and the image is all black), but after I re-pulled the diff-gaussian-rasterization-modified, the problem still failed.我遇到了和你一样的问题(在re10k上，进度条只能运行到38/41，图像是全黑的)，但是我重新拉了diffo - gausian -栅格化-modified后，问题仍然失败。

I would like to ask if you only need to run pip install for the compilation operation of diff-Gaussian-Rasterization-modified. To execute setup.py, I don't know if I need to compile again with cmake. In addition, in the diff-gaussian-rasterization modified/third_party/glm, its content is empty. I pulled out the glm content again, but I encountered some problems during compilation. Could you please analyze the construction process我想问一下，FF-Gaussian-Rasterization-modified的编译操作是否只需要运行pip install ?要执行setup.py，我不知道是否需要再次使用cmake进行编译。另外，在diff_gauss -栅格化修改后的/third_party/glm中，其内容为空。我再次取出glm内容，但在编译过程中遇到了一些问题。你能分析一下施工过程吗

yutmdfeng commented 2 months ago

@jjlinghu After obtaining the correct diff-gaussian-rasterization modified, I now get the correctly rendered image, but in the reasoning process on re10k, I can still only go through 38/41steps. How did you solve the problem of testing steps

donydchen / mvsplat

Got wrong results #16