Closed patbaa closed 4 years ago
kezdjünk megint 14:00-kor, ha mindenkinek jó
Ha van érdeklődés, én szívesen elmondom ezt: https://arxiv.org/pdf/1905.01164v2.pdf
engem érdekel.
Mindig is érdekelt, hogy hogy lehetne vizualizálni az "energia-felszínt". Nem tudom van-e jobb/újabb ennél, de egy viszonylag kézenfekvő módszer, és egész jól mutatja, hogy bizonyos hálók miért működnek jobban, mint mások. Ha belefér az időbe, elmondom:
Li, H., Xu, Z., Taylor, G., Studer, C. and Goldstein, T., 2018. Visualizing the loss landscape of neural nets. In Advances in Neural Information Processing Systems (pp. 6389-6399).
Abstract
Neural network training relies on our ability to find "good" minimizers of highly non-convex loss functions. It is well-known that certain network architecture designs (e.g., skip connections) produce loss functions that train easier, and well-chosen training parameters (batch size, learning rate, optimizer) produce minimizers that generalize better. However, the reasons for these differences, and their effects on the underlying loss landscape, are not well understood. In this paper, we explore the structure of neural loss functions, and the effect of loss landscapes on generalization, using a range of visualization methods. First, we introduce a simple "filter normalization" method that helps us visualize loss function curvature and make meaningful side-by-side comparisons between loss functions. Then, using a variety of visualizations, we explore how network architecture affects the loss landscape, and how training parameters affect the shape of minimizers.
Sziasztok!
Köszönöm, hogy engem is bevettetek ide! Kedden megyek én is, de elsőre még nem készülnék semmivel. Egyébként utána 16.15-től lesz ez az előadás a BME-n: http://math.bme.hu/~gnagy/mmsz/JelasityMark2019.htm. Hátha nem hallottatok róla, de érdekel még valakit rajtam kívül.
A loss landscape cikkhez Alex hívta fel a figyelmet, hogy van egy nagyon cool weboldal sok látványos vizualizációval, videoval és további újabb tanulságos cikkekkel.
Az EEG cikkhez amit Józsi mondott el, hasoló, de FMRI adatokból való "gondolat rekonstrukció". Évek óta próbálkoznak ezzel, vannak olyanok is, ahol beépített elektródák adják a jelet.
Shen, G., Dwivedi, K., Majima, K., Horikawa, T. and Kamitani, Y., 2019. End-to-end deep image reconstruction from human brain activity. Frontiers in Computational Neuroscience, 13. link
Volt ez a régebbi, amire még emlékszem:
Nishimoto, S., Vu, A.T., Naselaris, T., Benjamini, Y., Yu, B. and Gallant, J.L., 2011. Reconstructing visual experiences from brain activity evoked by natural movies. Current Biology, 21(19), pp.1641-1646.
egy video is van hozzá.
Sziasztok!
Kedden beszélek majd Mahoney cikkéről (igazából 2, de egész hasonló): https://arxiv.org/abs/1901.08278 Given two or more Deep Neural Networks (DNNs) with the same or similar architectures, and trained on the same dataset, but trained with different solvers, parameters, hyper-parameters, regularization, etc., can we predict which DNN will have the best test accuracy, and can we do so without peeking at the test data? In this paper, we show how to use a new Theory of Heavy-Tailed Self-Regularization (HT-SR) to answer this. HT-SR suggests, among other things, that modern DNNs exhibit what we call Heavy-Tailed Mechanistic Universality (HT-MU), meaning that the correlations in the layer weight matrices can be fit to a power law with exponents that lie in common Universality classes from Heavy-Tailed Random Matrix Theory (HT-RMT). From this, we develop a Universal capacity control metric that is a weighted average of these PL exponents. Rather than considering small toy NNs, we examine over 50 different, large-scale pre-trained DNNs, ranging over 15 different architectures, trained on ImagetNet, each of which has been reported to have different test accuracies. We show that this new capacity metric correlates very well with the reported test accuracies of these DNNs, looking across each architecture (VGG16/.../VGG19, ResNet10/.../ResNet152, etc.). We also show how to approximate the metric by the more familiar Product Norm capacity measure, as the average of the log Frobenius norm of the layer weight matrices. Our approach requires no changes to the underlying DNN or its loss function, it does not require us to train a model (although it could be used to monitor training), and it does not even require access to the ImageNet data.
https://arxiv.org/abs/1901.08276 Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet. Empirical and theoretical results clearly indicate that the empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionally-regularized statistical models, even in the absence of exogenously specifying traditional forms of regularization, such as Dropout or Weight Norm constraints. Building on recent results in RMT, most notably its extension to Universality classes of Heavy-Tailed matrices, we develop a theory to identify \emph{5+1 Phases of Training}, corresponding to increasing amounts of \emph{Implicit Self-Regularization}. For smaller and/or older DNNs, this Implicit Self-Regularization is like traditional Tikhonov regularization, in that there is a `size scale' separating signal from noise. For state-of-the-art DNNs, however, we identify a novel form of \emph{Heavy-Tailed Self-Regularization}, similar to the self-organization seen in the statistical physics of disordered systems. This implicit Self-Regularization can depend strongly on the many knobs of the training process. By exploiting the generalization gap phenomena, we demonstrate that we can cause a small model to exhibit all 5+1 phases of training simply by changing the batch size.