Anonymous review 3 - Githubissues

The following peer review was solicited as part of the Distill review process. Some points in this review were clarified by an editor after consulting the reviewer.

The reviewer chose to keep keep anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.

Distill is grateful to the reviewer, for taking the time to review this article.

Conflicts of Interest: Reviewer disclosed no conflicts of interest.

Review Result for Feature Visualization in distill.pub

Feature visualization is one of the important techniques to understand what neural networks have learned from image dataset. This paper focuses on optimization methods and discusses major issues and explore common approaches to solving them.

This paper is well organized, and the logic is clear and easy to follow. Start from why use optimization method to visualize feature in neural networks compared to finding examples from the dataset. Then discuss how to achieve diversity with optimization, which overcomes the diversity problem to some extent. Then further discuss the interaction between neurons, which can explore the combinations of neurons working together to represent images in neural networks. Finally, discuss how to improve the optimization process better by adding different regularizations. I appreciate the authors' effort in giving readers a comprehensive overview on feature visualization, mainly focusing on optimization methods. It will be good to add more descriptions on technical parts, such as in preconditioning and parameterization part. Also, I feel the title is a little broad due to there are many other feature visualization methods, such as input modification methods and deconvolutional methods.

I agree that optimization can isolate the causes of behavior from mere correlations and are more flexible when compared to finding real examples in the dataset. While, besides diversity mentioned in the paper, real examples from the dataset seem to more interpretable than examples generated from optimization. If we directly explore examples generated from optimization, it will be very hard to interpret by people sometimes. I think the authors have noticed this, so they put dataset examples as the last column in the table of the spectrum of regularization part. I suggest the authors to put more arguments on the advantages of using optimization methods to visualize feature learned by neural networks.

Some other comments:

This paper misses some experiment setting details, for example, what are the model and dataset used in this paper? How do you generate the images in this paper? Also, what do you actually mean by "diversity term"? These questions need to be further answered in the paper.
This paper focuses on optimization methods for feature visualization. Actually, there are other methods for feature visualization, which can refer to the paper “A Taxonomy and Library for Visualizing Learned Features in Convolutional Neural Networks”. For example, the deconvolution method is used in the paper "Visualizing and Understanding Convolutional Networks" and the method training CNN to reconstruct the images from the activations is used in the paper “Inverting visual representations with convolutional networks”. It will be better to have a discussion and compare with them.
The paper mentions that "we’ll discuss it in more depth later" and "we’ll talk about them in much more depth later", but I do not see any explanation later in this paper.
Some sentences are not formal, such as "What do we want examples of?" and "If neurons are not the right way to understand neural nets, what is?"
Some typos, such as: "because it separates the things causing behavior from from things that merely correlate with the causes." ==> remove one "from" "some approach to regularization will be one of their main points." ==> "approaches"

I personally feel that the writing of this paper is not so formal as an academic paper, and it looks more like a blog. Overall, it conducts a comprehensive survey on optimization methods for feature visualization, but does not propose new methods for feature visualization.

As an academic paper, I suggest to systematically summarize their approaches and further improve this draft.

According to the criteria in distill.pub and compared to other paper published in distill.pub, I think it can be accepted with some revisions.

Thank you for your high-quality feedback! We went through every sentence and have made numerous changes to the article based upon the review you provided. These can collectively be found in the pull request #8.

It will be good to add more descriptions on technical parts, such as in preconditioning and parameterization part.

On reviewing the section on preconditioning we agree that we were trying to be very general, potentially at the expense of concreteness and approachability. We rewrote the section in 29e8b19 to be more explicit about how this technique works when applied to images. We also added additional footnotes going into more detail on the the derivation of these techniques.

Also, I feel the title is a little broad due to there are many other feature visualization methods, such as input modification methods and deconvolutional methods.

[…]

Actually, there are other methods for feature visualization, which can refer to the paper “A Taxonomy and Library for Visualizing Learned Features in Convolutional Neural Networks”. For example, the deconvolution method is used in the paper "Visualizing and Understanding Convolutional Networks" and the method training CNN to reconstruct the images from the activations is used in the paper “Inverting visual representations with convolutional networks”. It will be better to have a discussion and compare with them.

We have expanded the introduction section to more clearly position our work in relation to these terms and papers, which we believe to be part of a second thread of research the community is beginning to call "attribution" or "saliency maps".

I suggest the authors to put more arguments on the advantages of using optimization methods to visualize feature learned by neural networks.

We expanded our discussing of the value of optimization based techniques over dataset based techniques in understanding neural network behavior. We see optimization based techniques as having a significant advantage when the input data distribution may change.

This paper misses some experiment setting details, for example, what are the model and dataset used in this paper?

We use GoogLeNet trained on ImageNet.

We added additional captioning on the hero diagram mentioning both the model and the dataset it was trained on.

How do you generate the images in this paper?

We try to make this clearer in multiple places:

We added a statement that the images in the article were created using the preconditioner and robustness transforms.
We added an additional footnote describing the exact robustness transforms used to create the images, as well as which optimizer and learning rates were used.
Finally, we reworded the transformation section to be give more details on how the regularization techniques tie into the optimization process.

[What] do you actually mean by "diversity term"?

We have added a footnote with the mathematical definition of our diversity term: cosine dissimilarity between the flattened Gram matrices.

The paper mentions that "we’ll discuss it in more depth later" and "we’ll talk about them in much more depth later", but I do not see any explanation later in this paper.

We have reworded all such occurrences and added hyperlinks to the concrete sections.

Some sentences are not formal, such as "What do we want examples of?" and "If neurons are not the right way to understand neural nets, what is?"

That is correct. We do not believe these phrasings hurt understanding and have decided to keep them.

You're absolutely correct that the style of this article — and Distill more broadly — is different from traditional academic writing. We believe there is room to improve and value in experimenting with academic communication. We have discussed this idea in more depth in https://distill.pub/2017/research-debt/ .

Some typos, such as: […]

Thanks for pointing them out, we have fixed those and proofread the article another time! :-)

distillpub / post--feature-visualization

Anonymous review 3 #6

Review Result for Feature Visualization in distill.pub