distillpub / post--differentiable-parameterizations

A powerful, under-explored tool for neural network visualizations and art.
https://distill.pub/2018/differentiable-parameterizations
Creative Commons Attribution 4.0 International
25 stars 11 forks source link

Review #1 #50

Open jmgilmer opened 6 years ago

jmgilmer commented 6 years ago

The following peer review was solicited as part of the Distill review process. The review was formatted by the editor to help with readability.

The reviewer chose to waive anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.

Distill is grateful to the reviewer, Pang Wei Koh, for taking the time to write such a thorough review.


High-level

I found the article interesting and thought-provoking, and the visualizations were eye-catching and very helpful. Thanks to the authors for putting in the effort to write this article and make all of the associated notebooks and visualizations!

There are two main ways that I think the article could be improved: 1) Providing more overall context and motivation beyond "let's use different parameterizations", and 2) Taking care to explain all of the concepts invoked (especially since this is a pedagogical article).

Here are more details on these.

1)

I think the biggest missing thing is a big-picture view about why different parameterizations might lead to different results, and why we might prefer one type of parameterization over another. For example, after reading the intro, I was still not sure about the motivation for the work. The argument went something like, we should use different parameterizations because we can. But what are examples of different parameterizations and why would we expect them to work better/differently?

The most persuasive argument (to me) was the one advanced in the CPPN section: that parameterizations impose constraints upon the optimized image that fit more with the kinds of pictures we'd like to see. This could be: pictures that are more realistic (CPPN); pictures that obey some sort of 3D smoothness (style transfer 3D section); etc. A variant of this argument can also be applied to the shared parameterization section. So perhaps this intuition could be given at the start of the article, together with more signposting of the kinds of parameterizations that the rest of the article would consider.

2)

I found it hard to follow some parts of the article. The argument roughly makes sense, but it was difficult for me to precisely understand what the authors were trying to convey. For example, take the first paragraph of the second section (Aligned Neuron Interpolation):

Sometimes we’d like to visualize how two neurons interact. This is the first sentence after the intro, and I didn't understand how it was related to what we'd just read in the intro. For example, in the intro, the goal seems to be to "describe the properties we want an image to have (e.g. style), and then optimize the input image to have those properties." Where did neurons come from and why do we care about how they interact? What does it even mean to visualize how they interact?

We can do this by optimizing convex combinations of two neurons. Why does optimizing convex combinations of two neurons allow us to visualize how two neurons interact? The link wasn't obvious to me.

If we do this naively, different frames of the resulting visualization will be unaligned — visual landmarks such as eyes appear in different locations in each frame. At this point I was quite confused: What's a frame? Where did frames come from?

This is because the optimization process that creates the visualization is stochastic: even optimizing for the same objective will lead to the visualization being laid out different each time. This is the first mention of stochasticity, and it's not clear how that's related to parameterizations, since different parameterizations would presumably have equally stochastic optimizations. Why is this a problem for RGB parameterizations and not others? At this point, I thought that the article was going to be mainly about how the default parameterization is non-convex, and perhaps that different parameterizations could lead to convexity.

Unfortunately, this randomness can make it harder to compare slightly different objectives. What different objectives are we considering? I'm guessing it's different convex combinations? Why would I want to compare them?

Similarly, the article talks about a "decorrelated parameterization" that somehow works better, but doesn't explain why (except by a brief reference to checkerboard gradients, which I'm guessing a decorrelated parameterization doesn't suffer from, but I'm not sure why that would be the case).

I'd suggest going through the article carefully and making sure that every sentence clearly follows from the previous one, especially for someone with only the minimum level of background knowledge.

Comments on figures

First figure: I was initially a bit confused by why the RGB representation was in the middle of the figure, instead of on the left. (I realized later that you're using neural networks that still operate on the RGB representation; so perhaps it's worth clarifying that you're only considering different parameterizations for the visualization, instead of the training.)

Second/third figures: These were broken for me (see attached screenshot). I only saw grey blocks.

Fourth figure: For some choices of style/content, including the first/default one, the decorrelated space picture looked exactly the same as the image space picture (and both looked bad; see attached screenshot). Is this a bug?

CPPN figure: I can't see the last figure of this section (there's just a big blank space). I'm also not sure what objective you're optimizing for in this section -- how are the pictures being generated?

Typos

"to fuel a small artistic movement based neural art." -> "to fuel a small artistic movement based on neural art." "each frame is parameterized as a combination of it’s own unique parameterization, and" -> "each frame is parameterized as a combination of its own unique parameterization and" "despite it’s remarkable success" -> "despite its remarkable success" "By iteratively adjustign the weights, our immaginary prisms" -> "By iteratively adjusting the weights, our imaginary prisms" "as a mean to generate" -> "as a means to generate" "But it’s certain possible to" -> "But it’s certainly possible to" "This kind more general use" -> "This kind of more general use"

znah commented 6 years ago

Thank you for the thoughtful review! We agree with your feedback, and it helped us focus on improving the weaknesses of our article. In particular:

  1. We’ve added (PR #53) explaining why the choice of parameterization effects the output.

  2. We’ve iterated a lot on the text to write in a way that assumes less background knowledge. We’ve also added well-commented notebooks for all our examples, to give highly detailed

Detailed Reply

I think the biggest missing thing is a big-picture view about why different parameterizations might lead to different results, and why we might prefer one type of parameterization over another. For example, after reading the intro, I was still not sure about the motivation for the work. The argument went something like, we should use different parameterizations because we can. But what are examples of different parameterizations and why would we expect them to work better/differently? The most persuasive argument (to me) was the one advanced in the CPPN section: that parameterizations impose constraints upon the optimized image that fit more with the kinds of pictures we'd like to see. This could be: pictures that are more realistic (CPPN); pictures that obey some sort of 3D smoothness (style transfer 3D section); etc. A variant of this argument can also be applied to the shared parameterization section. So perhaps this intuition could be given at the start of the article, together with more signposting of the kinds of parameterizations that the rest of the article would consider.

We now provide a description of four different reasons while different parameterizations can be used. We then provide examples for all of them and, for each section, we mention in which of these four categories the developed parameterization falls. (PR #53; Commits: fb9fcb6, 6bb3446, 60d2642)

I found it hard to follow some parts of the article. The argument roughly makes sense, but it was difficult for me to precisely understand what the authors were trying to convey. For example, take the first paragraph of the second section (Aligned Neuron Interpolation): …

This was excellent feedback. This section in particular seems to have assumed a lot of prior knowledge about visualizing neuron interactions. We rewrote the section to address this in PR #64. We now also explicitly link to the much longer discussion of these ideas in the Feature Visualization article.

Similarly, the article talks about a "decorrelated parameterization" that somehow works better, but doesn't explain why (except by a brief reference to checkerboard gradients, which I'm guessing a decorrelated parameterization doesn't suffer from, but I'm not sure why that would be the case).

Great point! We revised the section to clarify this in #67. :)

We added a footnote to better explain the reference to the work of Olah et al. and we improved the description of the benefits w.r.t. a pixel space optimization

First figure: I was initially a bit confused by why the RGB representation was in the middle of the figure, instead of on the left. (I realized later that you're using neural networks that still operate on the RGB representation; so perhaps it's worth clarifying that you're only considering different parameterizations for the visualization, instead of the training.)

We Improved the caption to highlight that the parameterization goes beyond the RGB space for which the network is trained for. We also highlighted that, by using parameterizations that can be “plugged” to the RGB parameterization, we have more flexibility in optimizing for existing network architectures.

Second/third figures: These were broken for me (see attached screenshot). I only saw grey blocks. Fourth figure: For some choices of style/content, including the first/default one, the decorrelated space picture looked exactly the same as the image space picture (and both looked bad; see attached screenshot). Is this a bug? CPPN figure: I can't see the last figure of this section (there's just a big blank space). I'm also not sure what objective you're optimizing for in this section -- how are the pictures being generated?

These problems should be fixed now.

Typos

Thanks for catching these! They’re now fixed (e.g. commits e0276f2, a934e73).