facebookresearch / synsin

View synthesis for the public.
http://www.robots.ox.ac.uk/~ow/synsin.html
Other
659 stars 93 forks source link

ResNetDecoder default params #15

Closed holynski closed 3 years ago

holynski commented 4 years ago

Hey -- thanks for releasing the code, it's been very useful.

I've been reading through to try to make I understand everything, and there's one part that has me a bit confused:

In the top-level training script, train.py, an instance of the main training model ZbufferModelPts is instantiated, which in turn instantiates all the sub-models, which include the depth estimator, the feature encoder, and the feature decoder.

In particular, the feature decoder is instantiated as: https://github.com/facebookresearch/synsin/blob/82ff948f91a779188c467922c8f5144018b40ac8/models/z_buffermodel.py#L35

which, if we've chosen to use the ResNet blocks (as is default in the parameters shown in train.sh), will return: https://github.com/facebookresearch/synsin/blob/82ff948f91a779188c467922c8f5144018b40ac8/models/networks/utilities.py#L31

Shouldn't the channels_in here be 64, since it's taking as input the re-projected features? Or am I missing something?

phongnhhn92 commented 4 years ago

@holynski Hello, Actually, the default parameter is UnetDecoder64 so ResNetDecoder is not used. But I think you are right, if you decide to use ResNetDecoder then channels_in should be 64.

holynski commented 4 years ago

Thanks!

Yes, you're right that the default parameter for get_decoder() is UNetDecoder64, but in train.sh the parameters specifically override this default:

https://github.com/facebookresearch/synsin/blob/82ff948f91a779188c467922c8f5144018b40ac8/train.sh#L11

Also, I've been following that codepath, because from my understanding of the paper, the ResNet architecture is the one used in the proposed method. From the paper:

Our spatial feature network and refinement networks are composed of ResNet blocks. (Appendix B)

It also mentions:

We experimented with using a UNet architecture instead of a sequence of ResNet blocks for the spatial feature network and refinement network. This led to much worse results and was more challenging to train. (Appendix E)

Before I dug into the code, I assumed that the default parameters (when directly calling train.py, or importing a ZBufferModel in another script) would have reproduced the architecture and parameters described in the paper, but it seems that this might not be the case.

phongnhhn92 commented 3 years ago

@holynski Thanks to your comment, I realize that authors has hard-coded this in their code https://github.com/facebookresearch/synsin/blame/82ff948f91a779188c467922c8f5144018b40ac8/models/networks/configs.py#L78 So actually the in_channels and out_channels doesn't matter if you use the 'resnet_256W8UpDown64' model ^^. Well you can change it easily now if you want to hehe.

holynski commented 3 years ago

Good catch, thanks!

It's probably worth making a pull request at some point to change all the default parameters to those used in the paper...