KaiqiangXiong / CL-MVSNet

[ICCV2023] CL-MVSNet: Unsupervised Multi-view Stereo with Dual-level Contrastive Learning
MIT License
40 stars 3 forks source link

ICC looks wrong #1

Closed alexrich021 closed 11 months ago

alexrich021 commented 11 months ago

When re-training, I get 0.339 overall on DTU instead of the reported 0.329 (using default settings, 8 GPUs). The ICC masking implementation here doesn't look quite right to me - imgs_icc is shape [B, V, 3, H, W], so for batch size 1, the for-loop doesn't actually have any iterations. Should it perhaps be changed to:

imgs_icc = data["imgs_aug"] # b v c h w
nviews = imgs_icc.shape[1]
for view_idx in range(1, nviews, 1):   # loops through view dimension instead of batch dimension
    per = min(self.args.p_icc * epoch / 15, self.args.p_icc)
    mask = torch.ones_like(imgs_icc[:, view_idx]) * per
    mask = 1 - mask.bernoulli() 
    imgs_icc[:, view_idx] = imgs_icc[:, view_idx] * mask
KaiqiangXiong commented 11 months ago

During experiments, we performed training 4 times, and the performances( 0.329, 0.330,0.328,0.332) were stable. Note that we chose the best checkpoint according to the 2mm error on the validation set, sometimes the checkpoint of epoch 13 or 14(from epoch 0 to 15). How did you choose the checkpoint? Or you can try to retrain it with the right ICC. Please let me know if the problem is solved.

alexrich021 commented 11 months ago

I simply chose epoch 15, but it was far enough off that I didn't try any other checkpoints. Did you check performance using this exact codebase after refactoring for release?

I'm currently re-training using my fix for the ICC. I'll let you know how it goes.

The other thing I noticed is the ICC mask implemented in the code is not shape $H \times W$ for a given image, it's actually $H \times W \times 3$, so it masks the R, G, and B channel of the images independently. Is this correct or did the version of the code used to produce the paper results use a mask of size $H \times W$?

KaiqiangXiong commented 11 months ago

I have only checked the performance using the refactored code with my previous checkpoint, which is released in the dir pretrained_model. And the testing performance is 0.329 as we reported. Have you tested with our pretrained_mode? I don't have enough computation resources now, which prevents me from retraining.

I only masked one channel of the images in my implementation, and this worked. It is worth noting that this means could also align with our motivation for ICC: to break the pixel-level photometric consistency to construct hard positive samples, thus leveraging the contrastive learning to improve the robustness of weak-textured areas. I guess masking all 3 channels can also achieve the same(or similar) results, but I have not tried.

alexrich021 commented 11 months ago

Can you post the snippet of code used for ICC masking when training the provided pre-trained model and indicate where in the pipeline it was performed? I am guessing the previous version of the code performed ICC masking prior to the batch dimension being added to the images.

alexrich021 commented 11 months ago

With the code I posted for the ICC masking, my re-trained model got 0.330 overall. Looks like that solved the problem. Thanks for your help!

KaiqiangXiong commented 11 months ago

👍👍I'm sorry because it's been a long time (half a year) since I did this work, and I can't remember all the details. Moreover, the code I implemented before has been modified many times and may have lost its original appearance.

I will update the ICC part with your revised version. Thank you very much too!