Convergent Learning: Do different neural networks learn the same representations?

Convergent Learning: Do different neural networks learn the same representations? Recent successes in training large, deep neural networks (DNNs) have prompted active investigation into the underlying representations learned on their intermediate layers. Such research is difficult because it requires making sense of non-linear computations performed by millions of learned parameters. However, despite the difficulty, such research is valuable because it increases our ability to understand current models and training algorithms and thus create improved versions of them. We argue for the value of investigating whether neural networks exhibit what we call convergent learning, which is when separately trained DNNs learn features that converge to span similar spaces. We further begin research into this question by introducing two techniques to approximately align neurons from two networks: a bipartite matching approach that makes one-to-one assignments between neurons and a spectral clustering approach that finds many-to-many mappings. Our initial approach to answering this question reveals many interesting, previously unknown properties of neural networks, and we argue that future research into the question of convergent learning will yield many more. The insights described here include (1) that some features are learned reliably in multiple networks, yet other features are not consistently learned; and (2) that units learn to span low-dimensional subspaces and, while these subspaces are common to multiple networks, the specific basis vectors learned are not; (3) that the average activation values of neurons vary considerably within a network, yet the mean activation values across different networks converge to an almost identical distribution.

Bibtex:

@InProceedings{pmlr-v44-li15convergent, title = {Convergent Learning: Do different neural networks learn the same representations?}, author = {Yixuan Li and Jason Yosinski and Jeff Clune and Hod Lipson and John Hopcroft}, booktitle = {Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015}, pages = {196--212}, year = {2015}, editor = {Dmitry Storcheus and Afshin Rostamizadeh and Sanjiv Kumar}, volume = {44}, series = {Proceedings of Machine Learning Research}, address = {Montreal, Canada}, month = {11 Dec}, publisher = {PMLR} }

From previous review: Another approach to understanding deep networks was developed by Li et al. (2015), who focused on whether different networks learn similar features (convergent learning). Their method involves first training many networks, then analyzing the representations learned by each network at a per-neuron, or per-neuron-group level. They found that representations could be learned both by individual neurons and by groups of neurons, and that, while multiple networks reliably learn certain features, other features were distinct to individual networks. This work reveals that, while deep networks may show similar levels of performance, they can differ in what they learn from the training data.

dais-ita / interpretability-papers

Convergent Learning: Do different neural networks learn the same representations? #35