Closed holynski closed 3 years ago
@holynski Hello, Actually, the default parameter is UnetDecoder64 so ResNetDecoder is not used. But I think you are right, if you decide to use ResNetDecoder then channels_in should be 64.
Thanks!
Yes, you're right that the default parameter for get_decoder()
is UNetDecoder64
, but in train.sh
the parameters specifically override this default:
Also, I've been following that codepath, because from my understanding of the paper, the ResNet architecture is the one used in the proposed method. From the paper:
Our spatial feature network and refinement networks are composed of ResNet blocks. (Appendix B)
It also mentions:
We experimented with using a UNet architecture instead of a sequence of ResNet blocks for the spatial feature network and refinement network. This led to much worse results and was more challenging to train. (Appendix E)
Before I dug into the code, I assumed that the default parameters (when directly calling train.py
, or importing a ZBufferModel
in another script) would have reproduced the architecture and parameters described in the paper, but it seems that this might not be the case.
@holynski Thanks to your comment, I realize that authors has hard-coded this in their code https://github.com/facebookresearch/synsin/blame/82ff948f91a779188c467922c8f5144018b40ac8/models/networks/configs.py#L78 So actually the in_channels and out_channels doesn't matter if you use the 'resnet_256W8UpDown64' model ^^. Well you can change it easily now if you want to hehe.
Good catch, thanks!
It's probably worth making a pull request at some point to change all the default parameters to those used in the paper...
Hey -- thanks for releasing the code, it's been very useful.
I've been reading through to try to make I understand everything, and there's one part that has me a bit confused:
In the top-level training script,
train.py
, an instance of the main training modelZbufferModelPts
is instantiated, which in turn instantiates all the sub-models, which include the depth estimator, the feature encoder, and the feature decoder.In particular, the feature decoder is instantiated as: https://github.com/facebookresearch/synsin/blob/82ff948f91a779188c467922c8f5144018b40ac8/models/z_buffermodel.py#L35
which, if we've chosen to use the ResNet blocks (as is default in the parameters shown in
train.sh
), will return: https://github.com/facebookresearch/synsin/blob/82ff948f91a779188c467922c8f5144018b40ac8/models/networks/utilities.py#L31Shouldn't the
channels_in
here be 64, since it's taking as input the re-projected features? Or am I missing something?