Design Space Diagram - Githubissues

colah commented 6 years ago

ludwigschubert commented 6 years ago

Diagrams like these always invite speculation about other combinations of these factors. Activations + Factorization + Attribution I believe we have done, no? (Color highlighting the principal components of an image's activations, attributed to input image pixels.)

Are any other combinations interesting?

arvind commented 6 years ago

Prototyping an idea of showing that some axes (namely how we slice the cube) have more granular degrees of freedom than others. This approach allows us to more carefully distinguish points in the design space.

Far from perfect though. Some issues:

It's unclear how exactly to read this diagram (i.e., the "Activation Grids + PCA Heatmap" line is that feature visualization is applied to the spatial activations and we additional compute a PCA heatmap).
Can we do feature visualization of anything but activations -- e.g., could I have a circle in the feature vis column and then a line all the way to attribution?
An additional axis: layer (input, output, hidden, multiple?)

colah commented 6 years ago

I think most combinations are possible. To give a few examples of some weirder things:

colah commented 6 years ago

Arvind, I love your diagram. However, I wonder a little bit if the choice of neuron/spatial/channel/group is really a separate choice for activations and attribution? It feels kind of like one gets wedded to whatever choice of atoms you use.

arvind commented 6 years ago

I agree, I don't think we've figured out the right decoupling of the axes yet. The reason I thought cube slicing might be an orthogonal axis was for diagrams that included multiple things (e.g., spatial attribution + NMF attribution; or channel attribution + activation heatmap etc.).

Here's a much simpler view:

What's missing in this diagram are the nuance of how activations are used in each diagram (particularly if there are multiple ticks in the activations column). Feature vis and activations seem inherently coupled. But activations are also useful beyond feature vis.

colah commented 6 years ago

(It feels like there might be an important primary/secondary distinction. Our channel visualizations have channels as the primary atoms, but display spatial heatmaps secondarily.)

colah commented 6 years ago

Or a bit tighter:

colah commented 6 years ago

It may be worth noting that many of our interfaces aren't as "pure" as they could be because there's often lots of opportunities to supplement the primary message with other things and make it more meaningful.

arvind commented 6 years ago

Nice!! This is my favorite one yet. I like the decoupling introduced by the Atoms/Layers/Content framing.

-Arvind

On Wed, Jan 10, 2018 at 4:13 PM Christopher Olah notifications@github.com wrote:

[image: image] https://user-images.githubusercontent.com/61658/34795737-18122474-f608-11e7-860b-a19f5b03f0dc.png

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/distillpub/post--interpretability-pieces/issues/1#issuecomment-356738531, or mute the thread https://github.com/notifications/unsubscribe-auth/AAClFvQL_sdaJ87FxEoL21kSTodwAs7_ks5tJSgOgaJpZM4RYctl .

--

Arvind Satyanarayan http://arvindsatya.com Sent from my iPhone

colah commented 6 years ago

More things to think about:

Number of examples:

No examples
Specific example
Many examples (eg. t-SNE visualizations of representation)

(This is often implicitly what "activation" is getting at -- eg. applying feature vis to a particular example -- although in some cases we also mean activation magnitudes.)

Further up the ladder of abstraction:

Comparing across training
Comparing across models (meta-SNE, feature vis of fine tuning, etc)

Other things:

Influence of dataset examples
Feedback pathways (eg. the training on human feedback of vis by reinforce idea)

colah commented 6 years ago

Experimenting with another approach. It still doesn't quite reify everything I want, but I feel like my brain has pretty rapidly adopted it for thinking about some things, which seems like a good sign.

colah commented 6 years ago

interfacespace-01

colah commented 6 years ago

Attempting to add dimensions of how many input examples we're showing and how we organize things.

shancarter commented 6 years ago

colah commented 6 years ago

On Friday, @arvind and I tried to formalize the space of interfaces into a grammar. I wanted to expand it a little and get it into this thread:

data Id = Int

data Atom = Neuron | Spatial | Channel | Group | Whole
data Layer = Input | Hidden Int | Out

data Target = NetworkTarget Id Atom Layer
            | Dataset Id
            | Parameters Id

data Content = Activation Target
             | Attribution Target Target

data Element = 
      NumericalContentPresentation Content
    | FeatureVisualizationPresentation Content
    | Filter Atom Content

data Interface = [ Element ]

colah commented 6 years ago

Questions I'm now thinking about:

What atoms can we break datasets up into?

Groups of examples?
What about subdividing parts of examples?

What atoms can we break parameters up into?

By layer?

How do t-SNE plots of representations fit into here?

Should "atom organization" be a part of the grammar?

What are the basic interfaces for ...

Comparing multiple models
Understanding a model over a training
Understanding how data effects final model inference
Understanding how data influences model training trajectory
Understanding the relationship between a model and a fine-tuned transferred version of the model.

What about interfaces that allow one to take actions, instead of just inspecting?

Modify training set
Create adversarial attacks of different kinds. (Test adversarial examples, training adversarial examples, transparency?? adversarial examples)
Directly give human feedback or constraints on the model and influence parameters

For interfaces involving multiple models, it seems like something about "aligning features" or "canonicalizing representations" is really essential. How does that fit into the story?

ludwigschubert commented 6 years ago

small re grammar: Would a Direction (linear combination of Neurons) be just a Group? In the same vein, Is a Group just a set of Atoms, or can it be a linear combination of Atoms?

arvind commented 6 years ago

Further brainstorming on the grammar:

data Id = Int

data Atom  = Neuron | Spatial | Channel | Group | Whole
data Layer = Input | Hidden Int | Out

data Substrate = Network Id Atom Layer
               | Dataset Id
               | Parameters Id

data Content = Activation Substrate
             | Attribution Substrate Substrate
             | Transform Content Substrate?

data Element = InfoVis Content | FeatureVis Content

data Interface = [ Element ]

This structure more closely mimics a traditional visualization pipeline of input data -> data transformations -> visual encodings.

Types of transformations we've thought of so far:

Filter
Project -- to move a piece of content to a new substrate.

arvind commented 6 years ago

Does FeatureVis actually operate over Content or a Substrate? Perhaps Content can also just be a Substrate?

arvind commented 6 years ago

A prototype of the design space diagram that uses color to encode the different symbols of the grammar:

arvind commented 6 years ago

To differentiate between showing a single hidden layer vs. multiple:

@ludwigschubert makes a good point that calling out number of hidden layers seems odd given that we don't do it elsewhere. Layer-to-layer operations could instead be signaled just with self loop arrows. Perhaps that's enough?

distillpub / post--building-blocks

Design Space Diagram #1

--