Thanks for sharing.
But it seems that downsampling should be performed by stride-2 convolutions in the 3×3
layer of the first block in each stage. This way, number of parameter may be reduced.
def resnet(input_shape, n_classes):
def conv_bn_rl(x, f, k=1, s=1, p='same'):
x = Conv2D(f, k, strides=s, padding=p)(x)
x = BatchNormalization()(x)
x = ReLU()(x)
return x
def conv_block(tensor, f1, f2, s):
# x = conv_bn_rl(tensor, f1, s=s)
# x = conv_bn_rl(x, f1, 3)
x = conv_bn_rl(tensor, f1)
x = conv_bn_rl(x, f1, 3, s=s)
x = Conv2D(f2, 1)(x)
x = BatchNormalization()(x)
shortcut = Conv2D(f2, 1, strides=s, padding='same')(tensor)
shortcut = BatchNormalization()(shortcut)
x = add([shortcut, x])
output = ReLU()(x)
return output
Thanks for sharing. But it seems that downsampling should be performed by stride-2 convolutions in the 3×3 layer of the first block in each stage. This way, number of parameter may be reduced.
according to Aggregated Residual Transformations for Deep Neural Networks