The inconsistent spatial size of train and val

When train with a feature map of 512x64x64, the convolutions of h and w are fixed to the dimension of 64. Then we do the validation with multi-scale which changes the input image, then the spatial size of feature map is not 64x64.

Is it a mistake? or some operations are secretly done? or I have a mistake of understanding the experiments due to I'm a newbie to the task of segmentation?

CWanli / RecoNet

The inconsistent spatial size of train and val #9