Is the encoder architecture ( conv_encoder ) from here the one used in the experiments in the paper?
It seems to be a discrepancy in terms of the size of conv. kernels used in the 3rd and 4th layers.
The code uses 2x2 convolutions there, while, according to Table 2 from the Appendix of the paper 4x4 convolutions are used all the way.
Is the encoder architecture (
conv_encoder
) from here the one used in the experiments in the paper? It seems to be a discrepancy in terms of the size of conv. kernels used in the 3rd and 4th layers. The code uses 2x2 convolutions there, while, according to Table 2 from the Appendix of the paper 4x4 convolutions are used all the way.