distillpub / post--building-blocks

The Building Blocks of Interpretability
https://distill.pub/2018/building-blocks
Creative Commons Attribution 4.0 International
87 stars 27 forks source link

Question regarding "we use dimensionality reduction to produce a multi-directional saliency map" #27

Closed rhezab closed 6 years ago

rhezab commented 6 years ago

Can you please elaborate a bit on this process (perhaps via a short reply to this issue)?

I'm guessing you reduce the thousand-dimension vector to some point on a line thru color space between blue and orange in the example given, where blue and orange correspond to the two top competing types of classes (e.g. orange : dog and blue : cat)

ludwigschubert commented 6 years ago

@rhezab thanks for asking clarifying questions! Unfortunately it looks like the dimensionality reduction part didn't end up in the published notebook—I'll see if I can find that code and if it's easy to port over, I may simply extend the notebook in the future.

Your guess is already pretty close: we take the thousand-dimensional vector and dimensionality reduce it (we probably simply used PCA) to two dimensions. The mapping from these two to colors is slightly more complicated as we encode two things:

(In contrast, the notebook only contains attribution of single classes.)

rhezab commented 6 years ago

@ludwigschubert thanks for answering my question! In particular, I found it very helpful that you cleared up the difference between the notebook example and the example in the article.That was something I was confused about, since in the notebook you do attribution given a label hint (so kind of like manual dimensionality reduction) but the article suggests actual dimensionality reduction (like PCA).

To be honest, I did not realise that the magnitude of the attribution is encoded as brightness - it's difficult for me to disambiguate between variations in brightness and variations in blueness/orangeness. I did get that information from the bar charts above though, so the interface definitely still gets the job done!

Also, if the code is not easy to port over, I'd be satisfied with learning what dimensionality reduction technique was used :)

colah commented 6 years ago

I'm pretty sure we did Non-Negative Matrix factorization. I was very excited about NMF when working on Building Blocks. :P

As I recall, there's a second trick to this: it really matters which layer you get the attributions that you apply dimensionality reduction to from. If you do it from a high-level layer (eg. mixed5a) you'll get factors corresponding to different large components in the image (eg. cat vs dog). If you do it at a lower layer (eg. mixed4a), you'll get more factors corresponding to parts of an object pushing for different related classes (eg. which type of dog). And at very low layers, there's lots of noise.

colah commented 6 years ago

Collect attributions for all classes:

attrs = []
for n, label in enumerate(labels):
  attr = raw_class_spatial_attr(img, "mixed5a", label,
                                override={"MaxPool": make_MaxSmoothPoolGrad()})
  if n % 20 == 0: print ".",
  attrs.append(attr)

attrs_arr = np.asarray(attrs).transpose(1,2,0)

Factorize and print out factors

from lucid.misc.channel_reducer import ChannelReducer

reducer = ChannelReducer(2, "NMF")
attr_reduced = reducer.fit_transform(np.maximum(0, attrs_arr))

for component in reducer._reducer.components_:
  print ""
  for n in np.argsort(-component)[:5]:
    print labels[n], component[n]
Labrador retriever 0.87
golden retriever 0.61
beagle 0.61
kuvasz 0.51
redbone 0.46

tiger 0.40
tiger cat 0.36
lynx 0.31
collie 0.28
brambling 0.27
rhezab commented 6 years ago

@colah thanks! Your code + Wikipedia really helped me get the gist of how Building Blocks did the dimensionality reduction. Right now the missing part of my understanding is doing all this with 3D tensors rather than 2D matrices, so I'll read up on that...

After reading up on NMF, it does have a simple elegance to it! Out of curiosity, what was it about NMF that particularly excited you?

Also find your comment about attributions at different layers really interesting. Are you suggesting that there are more meaningful factors in earlier layers? For instance, in the code example you asked for 2 factors for layer mixed5a, would you expect there to be more than 2 meaningful factors for an earlier layer, say, mixed4a? I suppose it kinda makes sense that later layers focus more on larger components given all the convolving and pooling happening in between...

ludwigschubert commented 6 years ago

@rhezab note that we treat activations as a 2D matrix when factoring them. May be easier to understand than you thought! :-)

IIRC part of why we preferred NMF over PCA was that we're trying to factorize activations here, and since the network we looked at used ReLU non-linearities, all activations are positive. Thus it seemed natural to insist the activations factor into purely positive "chunks", too.

rhezab commented 6 years ago

@ludwigschubert ah, I see! Thanks for clarifying, honestly blown away by how helpful y'all are.

On a tangential note, I'm curious about the statement that Google LeNet's neurons are "unusually semantically meaningful." Footnote says that the reason for this is an active area of investigation - any ideas?

I'm also curious about interpreting non-vision models. Have you done any work on interpreting non-vision models (e.g. RNNs for generating music)?

More generally, any suggestions for starter projects in interpretability? Some ideas (heavily inspired by Distill articles):

ludwigschubert commented 6 years ago

@rhezab you've hit the precise reason we're trying to be helpful—we'd love for more people to get involved in this area of research! :-) Please contact me at the email in my Github profile and we'll take it from there.

(This also applies to anyone who may find this conversation in the issue log in the future and feels similarly curious!)