activation histograms [-1,+1] per node or per unit (during training or evaluation): should not observe highly skewed distributions if training succeeds
weight/bias statistics (e.g. L2) per node or per unit
save the convolution filters per node (using image_grid_t)