Dear author. The paper says that "The encoder consists of three building blocks – the second and third blocks halve the size of the feature maps with stride 2". However, the code's param "n_blks = [4, 4, 4]" in class MASA indicates that the second and third blocks have 2^4 times downsampling by stride 2, which is inconsistent with the paper. So, how to explain that?
Dear author. The paper says that "The encoder consists of three building blocks – the second and third blocks halve the size of the feature maps with stride 2". However, the code's param "n_blks = [4, 4, 4]" in class MASA indicates that the second and third blocks have 2^4 times downsampling by stride 2, which is inconsistent with the paper. So, how to explain that?