Closed velocirabbit closed 5 years ago
I like the direction of this PR so far. For the future, should we cite and follow the TF API for arg and module naming conventions for extensions like this (as in tf.train.exponential_decay)?
For the BatchNormBlock
, should there be a post-activation version as well? As in PreBatchNormBlock
and PostBatchNormBlock
or maybe a flag? For instance, lets say we extend this convention into LayerNormBlock
or GroupNormBlock
and want to test the effect of a special activation that requires the opposite convention? Just spit-balling here.
tbh I don't love TensorFlow's existing learning rate API and I'm not sure how that plays with the callable API since Eager has different requirements than graph-mode. What I mean is, if we match the existing TF learning rate API the usage will be different and I suspect that might be confusing.
Post-activation BN is the default by simply adding BN after the layer with an activation. On thinking about it, I don't know that a wrapper is necessary since Keras supports Activation
. Consider:
Post-activation:
model = keras.Sequential([
Dense(units=128, activation='relu'),
BatchNormalization(),
])
Pre-activation:
model = keras.Sequential([
Dense(units=128),
BatchNormalization(),
Activation('relu'),
])
^ That's a good point (re: using Sequential
). The only advantage to using this custom block, then, would be that it's easier to understand
Cool, let's leave it in for now.
Woo, let's merge it in!
Cyclic LR is implemented via both a class (which makes more sense from an OOP POV) and a function that returns a function handle (which is how TF implements their schedulers).
Related issue: https://github.com/fomorians/pyoneer/issues/11