Inconsistencies between paper and Pytorch implementation

AIGR-sw commented 1 year ago

Thank you for your work!

I have noticed several inconsistencies between your paper and the Python implementation in this Github repository. Specifically, I have identified the following three inconsistencies:

In the paper (chapter 4.2 Implementation Details) you write: Notice that the two feature extractors do not share weights. However, in the Python implementation of the backbone in vgg.py the vgg13 model is instantiated only once, which results in sharing of weights between the two feature extractors. If you implemented the feature extractor without weight sharing, the resulting model would have 32.3 million weights, not 22.9 million as you write in the paper. The code should look as follows:
```
vgg13_d = torchvision.models.vgg13(pretrained=False)
self.downsample_2_d = vgg13_d.features[4:9]
self.downsample_4_d = vgg13_d.features[9:14]
self.downsample_8_d = vgg13_d.features[14:19]
self.downsample_16_d = vgg13_d.features[19:24]
```

vgg13_rgb = torchvision.models.vgg13(pretrained=False) self.downsample_2_rgb = vgg13_rgb.features[4:9] self.downsample_4_rgb = vgg13_rgb.features[9:14] self.downsample_8_rgb = vgg13_rgb.features[14:19] self.downsample_16_rgb = vgg13_rgb.features[19:24]


- According to the description of the network architecture (chapter 2.2 Network Architecture) in the supplementary material, the `argmax(conv3_c)` is only input to the layer `conv0_o`. However, in the Python implementation in [regressor.py](https://github.com/CVLAB-Unibo/neural-disparity-refinement/blob/main/lib/model/regressor.py) it is also used as input to the layers `conv1_o` and `conv2_o`. This results in more channels and thus more weights than would be obtained without these inputs.

- In the description of the loss you mention that
"The latter term in Equation 2 is minimized only if $D^\ast_s$ results in [-1,1]."
However in the implementation of the loss in [refiner.py](https://github.com/CVLAB-Unibo/neural-disparity-refinement/blob/main/lib/model/refiner.py) the following quantity is evaluated $(D^\ast - argmax(MLP_c(F_{\Delta})) - MLP_o(F_{\Delta}))$ and not $D^\ast_s = D^\ast - argmax(MLP_c(F_{\Delta}))$ as you write in the paper.

I would appreciate it if you could resolve these inconsistencies.

AIGR-sw commented 1 year ago

Any update on this issue?

fabiotosi92 commented 1 year ago

We're sorry for the late response. We've been dealing with some recent deadlines that have kept us busy. Thank you for your interest in our work, and we apologize for any confusion caused by the issues you raised.

1) Thank you for bringing this to our attention. You are correct: while the two stem blocks in our model ( self.stem_block_depth and self.stem_block_rgb ) do not share weights, the remaining part of the network does indeed, as you have observed in our Python implementation of the backbone in vgg.py. We have addressed this issue in a recent extension of our work (which is currently under submission) in which we adopted different backbones while ensuring they do not share weights.

2) Thank you for your comment regarding the network architecture described in our supplementary material. After reviewing our implementation, we acknowledge that the description in the supplementary material is inaccurate, as we wrongly reported the inputs to the layers conv1_o and conv2_o.

3) We have verified that our implementation follows this definition and we are confident in the correctness of our results. In particular, we define disparity as the sum of the classification score and the offset (line 66-71 in refiner.py), and if the absolute difference between this value and the target ground-truth disparity is within the range of [-1,1], we include it in the final loss (line 115 in refiner.py).

If you have any further concerns or questions, please do not hesitate to let us know.

CVLAB-Unibo / neural-disparity-refinement

Inconsistencies between paper and Pytorch implementation #4