distillpub / post--building-blocks

The Building Blocks of Interpretability
https://distill.pub/2018/building-blocks
Creative Commons Attribution 4.0 International
87 stars 27 forks source link

Design Space Diagram #1

Closed arvind closed 6 years ago

colah commented 6 years ago

image

ludwigschubert commented 6 years ago

Diagrams like these always invite speculation about other combinations of these factors. Activations + Factorization + Attribution I believe we have done, no? (Color highlighting the principal components of an image's activations, attributed to input image pixels.)

Are any other combinations interesting?

arvind commented 6 years ago

Prototyping an idea of showing that some axes (namely how we slice the cube) have more granular degrees of freedom than others. This approach allows us to more carefully distinguish points in the design space.

image

Far from perfect though. Some issues:

colah commented 6 years ago

I think most combinations are possible. To give a few examples of some weirder things:

image

colah commented 6 years ago

Arvind, I love your diagram. However, I wonder a little bit if the choice of neuron/spatial/channel/group is really a separate choice for activations and attribution? It feels kind of like one gets wedded to whatever choice of atoms you use.

arvind commented 6 years ago

I agree, I don't think we've figured out the right decoupling of the axes yet. The reason I thought cube slicing might be an orthogonal axis was for diagrams that included multiple things (e.g., spatial attribution + NMF attribution; or channel attribution + activation heatmap etc.).

Here's a much simpler view:

image

What's missing in this diagram are the nuance of how activations are used in each diagram (particularly if there are multiple ticks in the activations column). Feature vis and activations seem inherently coupled. But activations are also useful beyond feature vis.

colah commented 6 years ago

(It feels like there might be an important primary/secondary distinction. Our channel visualizations have channels as the primary atoms, but display spatial heatmaps secondarily.)

colah commented 6 years ago

image

Or a bit tighter:

image

colah commented 6 years ago

It may be worth noting that many of our interfaces aren't as "pure" as they could be because there's often lots of opportunities to supplement the primary message with other things and make it more meaningful.

arvind commented 6 years ago

Nice!! This is my favorite one yet. I like the decoupling introduced by the Atoms/Layers/Content framing.

-Arvind

On Wed, Jan 10, 2018 at 4:13 PM Christopher Olah notifications@github.com wrote:

[image: image] https://user-images.githubusercontent.com/61658/34795737-18122474-f608-11e7-860b-a19f5b03f0dc.png

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/distillpub/post--interpretability-pieces/issues/1#issuecomment-356738531, or mute the thread https://github.com/notifications/unsubscribe-auth/AAClFvQL_sdaJ87FxEoL21kSTodwAs7_ks5tJSgOgaJpZM4RYctl .

--

Arvind Satyanarayan http://arvindsatya.com Sent from my iPhone

colah commented 6 years ago

More things to think about:

Number of examples:

(This is often implicitly what "activation" is getting at -- eg. applying feature vis to a particular example -- although in some cases we also mean activation magnitudes.)

Further up the ladder of abstraction:

Other things:

colah commented 6 years ago

Experimenting with another approach. It still doesn't quite reify everything I want, but I feel like my brain has pretty rapidly adopted it for thinking about some things, which seems like a good sign.

image image image image image

colah commented 6 years ago

interfacespace-01

colah commented 6 years ago

Attempting to add dimensions of how many input examples we're showing and how we organize things.

image

shancarter commented 6 years ago

image

shancarter commented 6 years ago

image

colah commented 6 years ago

On Friday, @arvind and I tried to formalize the space of interfaces into a grammar. I wanted to expand it a little and get it into this thread:

data Id = Int

data Atom = Neuron | Spatial | Channel | Group | Whole
data Layer = Input | Hidden Int | Out

data Target = NetworkTarget Id Atom Layer
            | Dataset Id
            | Parameters Id

data Content = Activation Target
             | Attribution Target Target

data Element = 
      NumericalContentPresentation Content
    | FeatureVisualizationPresentation Content
    | Filter Atom Content

data Interface = [ Element ]
colah commented 6 years ago

Questions I'm now thinking about:

What atoms can we break datasets up into?

What atoms can we break parameters up into?

How do t-SNE plots of representations fit into here?

What are the basic interfaces for ...

What about interfaces that allow one to take actions, instead of just inspecting?

For interfaces involving multiple models, it seems like something about "aligning features" or "canonicalizing representations" is really essential. How does that fit into the story?

ludwigschubert commented 6 years ago

small re grammar: Would a Direction (linear combination of Neurons) be just a Group? In the same vein, Is a Group just a set of Atoms, or can it be a linear combination of Atoms?

arvind commented 6 years ago

Further brainstorming on the grammar:

data Id = Int

data Atom  = Neuron | Spatial | Channel | Group | Whole
data Layer = Input | Hidden Int | Out

data Substrate = Network Id Atom Layer
               | Dataset Id
               | Parameters Id

data Content = Activation Substrate
             | Attribution Substrate Substrate
             | Transform Content Substrate?

data Element = InfoVis Content | FeatureVis Content

data Interface = [ Element ]

This structure more closely mimics a traditional visualization pipeline of input data -> data transformations -> visual encodings.

Types of transformations we've thought of so far:

arvind commented 6 years ago

Does FeatureVis actually operate over Content or a Substrate? Perhaps Content can also just be a Substrate?

arvind commented 6 years ago

A prototype of the design space diagram that uses color to encode the different symbols of the grammar:

image

arvind commented 6 years ago

To differentiate between showing a single hidden layer vs. multiple:

image

@ludwigschubert makes a good point that calling out number of hidden layers seems odd given that we don't do it elsewhere. Layer-to-layer operations could instead be signaled just with self loop arrows. Perhaps that's enough?