improve thesis - Githubissues

[x] Supervisor should probably be Prof. Klambauer
[x] Reference [13] is not really an ML paper, therefore it is a bit strange that you cite it that often. One possible workaround is to create your own categorisation and cite [13] as the source of inspiration
[x] A reference should allow people to look up the referenced work (in most cases). This would be a problem for reference [15]
[x] missing a reference to (Blumenfeld et al., 2020)
[x] How did you go from equations (2.8)/(2.12) to (2.9)/(2.13)? There seem to be a few steps (or references) missing
[x] How did you solve the system of equations given by (2.15) and (2.16) PS: equation (2.16) should have the square outside of ELU and you can use \begin{cases}\end{cases} to typeset systems of equations
[x] Section 2 and 3 both seem to list various initialization methods and it is not entirely clear (to me) how you made the division. Try to make a clear distinction between trivial/established methods and your contributions. PS: the title "background" implies that a section presents well-established concepts (no new contributions)
[x] There is a paper introducing PyTorch that you can/should cite: (Paszke et al., 2019)
[x] Note that MNIST is much older than reference [11] make believe: (Bottou et al., 1994)
[x] n section 4.2: why are smaller variations of the signals advantageous?
[x] What is the deterministic initialization approach in Figure 5.5 exactly?
[x] Some of the loss curve plots (e.g. in figures 5.7 to 5.9) become hard to interpret. Better visibility is often possible if you plot the log-loss
[x] typo in section 5.2: gab -> gap
[x] do not force page breaks in scientific documents. Let LaTeX do its thing.

references:

Bottou, L., Cortes, C., Denker, J. S., Drucker, H., Guyon, I., Jackel, L. D., LeCun, Y., Muller, U. A., Sackinger, E., Simard, P., & Vapnik, V. (1994). Comparison of classifier methods: a case study in handwritten digit recognition. Proceedings of the 12th IAPR International Conference on Pattern Recognition, 2, 77–82. https://doi.org/10.1109/ICPR.1994.576879
Paszke, A. et al. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems, 32, 8026–8037. https://papers.nips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html
Blumenfeld, Y., Gilboa, D., & Soudry, D. (2020). Beyond Signal Propagation: Is Feature Diversity Necessary in Deep Neural Network Initialization? Proceedings of the 37th International Conference on Machine Learning, 119, 960–969. http://proceedings.mlr.press/v119/blumenfeld20a.html

c-schicho / ZeroInitializationLearningDynamics

improve thesis #11