phillipi commented 5 years ago

The following peer review was solicited as part of the Distill review process. The review was formatted by the editor to help with readability.

The reviewer chose to waive anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.

Distill is grateful to the reviewer, David Bau, for taking the time to write such a thorough review.

What type of contributions does this article make?

New visualization method providing new results

How significant are these contributions

4/5

Communication:

Is the article well-organized, focused and structured?

3/5

Is the article well-written? Does it avoid needless jargon?

3/5

Diagram & Interface Style

5/5

Impact of diagrams / interfaces / tools for thought?

5/5

How readable is the paper, accounting for the difficulty of the topic?

3/5

Comments on communication:

The most clarifying examples and interesting results are not hinted at until the very end of the article. The baseball + great white shark example (and the wok/frying pan example) are the strongest results, and the article should lead with them as motivating questions.

Before reading the paper, if you asked me, "Why is it that when we paste an image of a baseball into the foreground, the network switches its prediction to a shark? How can we fool a network into classifying a wok as a frying pan, or vice-versa?" I would have no idea the answer.

That's because these questions cannot be answered without your visualization! So if you started with these questions, and the assertion that the answers are ready to be revealed inside this beautiful atlas, it would help motivate exploration, and it would also help guide the reader to pay attention to the right details while reading the article.

Beyond the writing style question - I also think the article would be more interesting and stronger if these examples and experiments were fleshed out further. More comments on the next page.

Scientific correctness & integrity:

Are experiments in the article well designed, and interpreted fairly?

3/5

Does the article critically evaluate its limitations? How easily would a lay person understand them?

3/5

How easy would it be to replicate (or falsify) the results?

5/5

Does the article cite relevant work?

4/5

Considering all factors, does the article exhibit strong intellectual honesty and scientific hygiene?

3/5

Comments on scientific correctness & integrity:

The paper is carefully written and does not overclaim.

However, it also does not go to great lengths to try to triangulate or strengthen its most interesting contributions. To stand out, I wish the paper would have gone further in the following directions.

First, I would love to see the method applied on a second or third dataset. Imagenet is used frequently to illustrate visualizations, but different datasets can behave very differently. The paper would be stronger if it were able to demonstrate that the visualization provides insight not only about imagenet, but about classification of other types of imagery (such as scene classification, action classification, or fine-grained classification).

Second, want to understand how the method behaves on other architectures. GoogLeNet-style architectures are great, but ResNet is also super powerful and very commonly used. But there are differences in architecture that are potentially serious enough to cause a visualization method to behave very differently. Does the technique work just as well in that setting? Similarly, after reading, I want to know if I should expect the technique to be applicable to a network used for captioning or segmentation.

Third, and most importantly, I think the "Focusing on a Single Classification and Further Isolating Classes" results are the most striking in the paper, and I wish they were fleshed out to do them justice. The examples that one might call "manual adversarial examples" are awesome, but there is no sense given if the examples are typical or unusual. Providing enough information to quantify the results would make the results much more interesting.

In particular, I wish the paper answered two questions.

How bad can the examples get? Examples are shown of creating fooling examples to push a network to related neighbor classes. But in words it's asserted that a fireboat is a window plus a crane plus water. Can a completely non-fireboat related image be constructed with those three components in a way that causes the network to say "fireboat?"
How easily can this be done? If a user is given the visualization of a pair of classes and maybe some way to choose and paste clip art that strongly activates units, how often would they be able to construct a fooling example to move an image from one class to another? Can they do it more successfully with the visualization than they can without? Or if users aren't in the loop, can a very simple algorithm (such as scoring pieces of clipart to activate features and creating a collage) be described, shown, and quantified to provide insight on how foolable a SOTA classifier can be?

The paper's beautiful visualizations uncover several terrifically interesting ideas. It would be much stronger if the most striking claims were broader and deeper.

General comments:

The atlas is a beautifully done visualization contribution. By artfully merging several techniques, it uncovers genuinely new insights about what a state-of-the-art image classifier is doing inside. The paper describes the visualization method clearly and applies it to reveal the internal structure of several layers of an inception network trained to classify imagenet. Some latent directions are demonstrated, and some evidence of compositional effects are shown. Finally, and most strikingly, it is demonstrated that (some) individual classifications can be understood as a mere sum of parts.

The demonstrations of this last finding suggest a significant weakness in the behavior of modern SOTA object classifiers. For example, no human who recognizes the difference between a pan and a wok would call a pan a wok just because it's next to noodles. Nor would a human expert confuse a gray whale fin with a white shark, just because a baseball floated by.

The demonstration of this type of weakness is the most interesting contribution of this article, and I wish the methods were pushed a little further to strengthen the finding.

Do the visualizations reveal similar phenomena in the classification of other data sets? Could this method trigger mistakes when detecting cancer, or when when recognizing a small child chasing a ball into a busy highway? Although a single paper couldn't be expected to test all data sets, exploration of a second dataset feels needed. Scene classification has some very different characteristics and seems like it would be straightforward to try.

Do other image classification architectures behave similarly? I wonder specifically about Resnet, since it is so effective and so commonly used; and I also wonder if some of the effects go away or change significantly when the architecture are applied on closely related domains like image segmentation.

Can this visualization tool make it easy for people to understand weaknesses well enough to construct fooling examples? Or are these examples rare, and do they remain hard to create? I think this is the most interesting question, and I strongly wish the article took some steps to try to quantify these results.

I think the article is an interesting read and a valuable contribution as-is. However, this new visualization reveals hints of insights that, if fully fleshed out, would make the paper really stand out.

shancarter commented 5 years ago

Thanks for the detailed feedback. We have added a section to the introduction hinting at the shark/baseball adversarial patches to promote those results in a better way. We have also added some numerical analysis to the end of the section “Further Isolating Classes” where we run the technique on thousands of images in ImageNet.

The atlas is a beautifully done visualization contribution. By artfully merging several techniques, it uncovers genuinely new insights about what a state-of-the-art image classifier is doing inside.

We’re glad you enjoyed it!

The most clarifying examples and interesting results are not hinted at until the very end of the article. The baseball + great white shark example (and the wok/frying pan example) are the strongest results, and the article should lead with them as motivating questions.

Before reading the paper, if you asked me, "Why is it that when we paste an image of a baseball into the foreground, the network switches its prediction to a shark? How can we fool a network into classifying a wok as a frying pan, or vice-versa?" I would have no idea the answer.

We’re glad you found them interesting! Following your suggestion, we added (in c6a6a85baa7) a teaser to the relevant section to the introduction, with a small sample of the results that you find most compelling, motivating the reader as you suggested.

First, I would love to see the method applied on a second or third dataset. ... Second, want to understand how the method behaves on other architectures.

We’d love to see both of these as well. Unfortunately, we think we’ll need to leave this to future work, but we think there’s a tremendous amount to explore.

Our early experiments in looking at other models seem quite promising, but we think it will take more work to really do justice in expanding this type of visualization. In particular, we think there may be powerful ways to use Activation Atlases to compare neural network architectures, but we think it will require additional techniques to do properly and this paper was already running fairly long.

Third, and most importantly, I think the "Focusing on a Single Classification and Further Isolating Classes" results are the most striking in the paper, and I wish they were fleshed out to do them justice. The examples that one might call "manual adversarial examples" are awesome, but there is no sense given if the examples are typical or unusual. Providing enough information to quantify the results would make the results much more interesting.

We’re glad you found them interesting! :)

We’ve added a quantitative evaluation: for a given attack, what fraction of images of that class does the attack work on? This shows that a given attack doesn’t just work on cherry picked examples.

Something that’s harder to get at is how common the attacks themselves are -- how many such attacks exist, and how hard are they to find? We explore this a little bit by showing some attacks we thought might work, and how successful each was. You certainly do have to try a couple to get one that works well!

colah commented 5 years ago

Numerical scores contextualized with scoring rubric entries.

How significant are these contributions

4/5 - Subject-matter experts would learn a lot from reading this paper. // Significant improvement or new angle over previous explanations.

Communication:

Is the article well-organized, focused and structured?

3/5 - Article is organized and on point.

Is the article well-written? Does it avoid needless jargon?

3/5 - Text is fairly readable but could be improved.

Diagram & Interface Style

5/5 - Diagrams minimize visual noise and focus the reader's attention on what's important. They make effective use of best practices (including gestalt principles and alignment, appropriate captioning and labeling, effective use of color, etc.)

Impact of diagrams / interfaces / tools for thought?

5/5 - Diagrams have a transformative impact. They make concepts much easier to understand, deeply engage with, and surface insights. https://distill.pub/2017/momentum is an exemplar.

How readable is the paper, accounting for the difficulty of the topic?

3/5 - Given the difficulty of the topic, the writing can be understood with reasonable effort.

Scientific correctness & integrity:

Are experiments in the article well designed, and interpreted fairly?

3/5 - Claims in paper are reasonably supported, as appropriate based on the paper's framing of them. Major caveats are noted.

Does the article critically evaluate its limitations? How easily would a lay person understand them?

3/5 - The article acknowledges limitations, but may not be accessible beyond a research audience.

How easy would it be to replicate (or falsify) the results?

5/5 - "Active reproducibility." Results are easy to reproduce and build on. For example, authors may provide hosted notebooks that allow re-running their experiments without even setting up infrastructure.

Does the article cite relevant work?

4/5 - Article strikes a good balance between keeping the article tight and orienting the reader in related work / being academically generous. May use an appendix or footnotes in balancing these needs.

Considering all factors, does the article exhibit strong intellectual honesty and scientific hygiene?

3/5 - Meets normal academic expectations of the field.

phillipi commented 5 years ago

Thanks for these revisions, I think the new quantitative results are especially interesting and strengthen the article. I feel the reviewer's concerns have been adequately addressed.

distillpub / post--activation-atlas

Review #3 #4

What type of contributions does this article make?

How significant are these contributions

Communication:

Is the article well-organized, focused and structured?

Is the article well-written? Does it avoid needless jargon?

Diagram & Interface Style

Impact of diagrams / interfaces / tools for thought?

How readable is the paper, accounting for the difficulty of the topic?

Comments on communication:

Scientific correctness & integrity:

Are experiments in the article well designed, and interpreted fairly?

Does the article critically evaluate its limitations? How easily would a lay person understand them?

How easy would it be to replicate (or falsify) the results?

Does the article cite relevant work?

Considering all factors, does the article exhibit strong intellectual honesty and scientific hygiene?

Comments on scientific correctness & integrity:

General comments:

Numerical scores contextualized with scoring rubric entries.

How significant are these contributions

Communication:

Is the article well-organized, focused and structured?

Is the article well-written? Does it avoid needless jargon?

Diagram & Interface Style

Impact of diagrams / interfaces / tools for thought?

How readable is the paper, accounting for the difficulty of the topic?

Scientific correctness & integrity:

Are experiments in the article well designed, and interpreted fairly?

Does the article critically evaluate its limitations? How easily would a lay person understand them?

How easy would it be to replicate (or falsify) the results?

Does the article cite relevant work?

Considering all factors, does the article exhibit strong intellectual honesty and scientific hygiene?