Currently the maximum kernel/column norm hyperparameters in the YAML files we are using have been set by just using a default value taken from the MNIST pylearn2 tutorial. A better way to set them would be to follow the advice in this post and set them to 0.8 of the mean value which the kernel/column norms tend to when running without any constraints.
Therefore need to run model without any constraints set long enough for kernel and column norms to settle and record values for each of the layers by reading out of plots of relevant monitor channels.
Depending on whether how much these values seem to be layer depth/width/type dependent we may be able to do this once and use these as estimates for all future runs or may need to do further test runs like this if we significantly change the architecture.
Currently the maximum kernel/column norm hyperparameters in the YAML files we are using have been set by just using a default value taken from the MNIST pylearn2 tutorial. A better way to set them would be to follow the advice in this post and set them to 0.8 of the mean value which the kernel/column norms tend to when running without any constraints.
Therefore need to run model without any constraints set long enough for kernel and column norms to settle and record values for each of the layers by reading out of plots of relevant monitor channels.
Depending on whether how much these values seem to be layer depth/width/type dependent we may be able to do this once and use these as estimates for all future runs or may need to do further test runs like this if we significantly change the architecture.