Overhaul parameterization (#4)

TidalPaladin commented 5 years ago

Initial implementation for #4. Only the following parameterizations remain:

Output width (hard to get around this, also useful for easily training 1/2 width models)
Bottleneck factor (defaults to 1/4 of output width)
Downsample stride (defaults to 2)

TidalPaladin commented 5 years ago

Working model is as follows:

Layer (type)                 Output Shape              Param #
=================================================================
tail (Tail)                  (32, 64, 64, 32)          864
_________________________________________________________________
bottleneck (Bottleneck)      (32, 64, 64, 32)          776
_________________________________________________________________
bottleneck_1 (Bottleneck)    (32, 64, 64, 32)          776
_________________________________________________________________
bottleneck_2 (Bottleneck)    (32, 64, 64, 32)          776
_________________________________________________________________
bottleneck_3 (Bottleneck)    (32, 64, 64, 32)          776
_________________________________________________________________
downsample (Downsample)      (32, 31, 31, 64)          20496
_________________________________________________________________
bottleneck_4 (Bottleneck)    (32, 31, 31, 64)          2576
_________________________________________________________________
bottleneck_5 (Bottleneck)    (32, 31, 31, 64)          2576
_________________________________________________________________
bottleneck_6 (Bottleneck)    (32, 31, 31, 64)          2576
_________________________________________________________________
bottleneck_7 (Bottleneck)    (32, 31, 31, 64)          2576
_________________________________________________________________
bottleneck_8 (Bottleneck)    (32, 31, 31, 64)          2576
_________________________________________________________________
bottleneck_9 (Bottleneck)    (32, 31, 31, 64)          2576
_________________________________________________________________
downsample_1 (Downsample)    (32, 15, 15, 128)         80928
_________________________________________________________________
bottleneck_10 (Bottleneck)   (32, 15, 15, 128)         9248
_________________________________________________________________
bottleneck_11 (Bottleneck)   (32, 15, 15, 128)         9248
_________________________________________________________________
downsample_2 (Downsample)    (32, 7, 7, 256)           321600
_________________________________________________________________
final_bn (BatchNormalization (32, 7, 7, 256)           1024
_________________________________________________________________
final_relu (ReLU)            (32, 7, 7, 256)           0
_________________________________________________________________
head (Head)                  (32, 100)                 25700
=================================================================
Total params: 487,668
Trainable params: 483,508
Non-trainable params: 4,160
_________________________________________________________________

TidalPaladin commented 5 years ago

Also note this PR implements 1x1 -> 3x3 -> 1x1 such that the spatial 3x3 convolution is depthwise separable. Previously it was a standard Conv2D.

TidalPaladin commented 5 years ago

Per suggestion, maintain an even output shape after downsampling to facilitate alignment in encoder/decoder architectures. Fixing this by setting padding=same on spatial convolutions in the Downsample block.

New model:

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
tail (Tail)                  (32, 64, 64, 32)          864
_________________________________________________________________
bottleneck (Bottleneck)      (32, 64, 64, 32)          776
_________________________________________________________________
bottleneck_1 (Bottleneck)    (32, 64, 64, 32)          776
_________________________________________________________________
bottleneck_2 (Bottleneck)    (32, 64, 64, 32)          776
_________________________________________________________________
bottleneck_3 (Bottleneck)    (32, 64, 64, 32)          776
_________________________________________________________________
downsample (Downsample)      (32, 32, 32, 64)          20496
_________________________________________________________________
bottleneck_4 (Bottleneck)    (32, 32, 32, 64)          2576
_________________________________________________________________
bottleneck_5 (Bottleneck)    (32, 32, 32, 64)          2576
_________________________________________________________________
bottleneck_6 (Bottleneck)    (32, 32, 32, 64)          2576
_________________________________________________________________
bottleneck_7 (Bottleneck)    (32, 32, 32, 64)          2576
_________________________________________________________________
bottleneck_8 (Bottleneck)    (32, 32, 32, 64)          2576
_________________________________________________________________
bottleneck_9 (Bottleneck)    (32, 32, 32, 64)          2576
_________________________________________________________________
downsample_1 (Downsample)    (32, 16, 16, 128)         80928
_________________________________________________________________
bottleneck_10 (Bottleneck)   (32, 16, 16, 128)         9248
_________________________________________________________________
bottleneck_11 (Bottleneck)   (32, 16, 16, 128)         9248
_________________________________________________________________
downsample_2 (Downsample)    (32, 8, 8, 256)           321600
_________________________________________________________________
final_bn (BatchNormalization (32, 8, 8, 256)           1024
_________________________________________________________________
final_relu (ReLU)            (32, 8, 8, 256)           0
_________________________________________________________________
head (Head)                  (32, 100)                 25700
=================================================================
Total params: 487,668
Trainable params: 483,508
Non-trainable params: 4,160
_________________________________________________________________

lgtm-com[bot] commented 5 years ago

This pull request fixes 2 alerts when merging e22ac961ff2888a56db14ec4790fe7d0fad6a57a into 19623880640c9bcf9b3afd3ad7778efacfe63c67 - view on LGTM.com

fixed alerts:

1 for Unused import
1 for First argument to super() is not enclosing class

TidalPaladin commented 5 years ago

Model is now suitable to begin implementing a training pipleline (#7). Further work will be handled under that issue.

TidalPaladin / tiny-imagenet-demo

Overhaul parameterization (#4) #12