Our Attention Framework and Probe Detection

eivinasbutkus commented 4 years ago

I was reading Alvarez & Scholl 2005. It seems there are two main (visual) attention frameworks (the drawing attached should not be taken too seriously):

Image from iOS (6)

Spatial attention (spotlight kind of attention). This is the idea of processing some areas of the visual space more than others (even if there are no objects, there are partial objects, or there are multiple objects under the spotlight). I don't think I fully understand this: why a certain region is attended to (bottom-up cues?) or what it means to process an area more.
Object-based attention. I understand this as still spatial - it's still a kind of spotlight, but this time it's more based on the beliefs about objects (i.e. their locations, extensions, etc.). Maybe this is like looking at certain areas of the visual space to resolve uncertainty about objects or hypotheses. But again, what does it mean to process more - is it like sharpening the bottom up signal?

Note that in both (1) and (2) attention seems to be a bit like selecting regions in the visual/pixel space and processing them more (whatever that means).

I think we are proposing something slightly different. I don't have a good name for it, but we can call it:

Belief-based attention. The idea here is to fix the observation (e.g. the whole image or masks) and then selectively refine parts of the belief space based on the task.

But then it seems that our framework cannot be directly tested using probe detection because we're not really saying anything about what regions of the image are processed more? If that is the case, I still think we can make the argument that object-based and belief-based attention is correlated? Or maybe you think that's not the case?

Maybe this comes back to @iyildirim 's point about having a computational model of probe detection. I don't think we necessarily need it, but I don't know if we have a mechanistic theory of what is happening exactly yet, i.e. how probe detection relates to our attention framework.

I'd be very interested to hear your thoughts!

iyildirim commented 4 years ago

Thanks, @eivinasbutkus. These are very good points. I highly recommend reading this paper by Steven Yantis http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.424.2899&rep=rep1&type=pdf everything in that paper but especially its Discussion section.

The distinction between object-based and spatial attention is done well in that paper. Notice that what you call belief-based is really object-based -- this should be more obvious if you think about non-rigid polygons or articulated biological motion.

Probe detection with independent objects will allow us to test our theories of attention, in particular what the objective is (target designations vs. states and assignments).

Experiments with polygons and biological motion will allow us to tell apart object-based vs. spatial forms of attention -- we will show how a simple (at least conceptually) extension of our model can attend to objects that only exist in the mind (e.g., polygons and biologicla motion arising from point-light stimuli).

eivinasbutkus commented 4 years ago

Great, thanks, I'll read that!

I guess I should read Yantis first, but I really want to ask this: what is our computational theory of attention?

Like I understand that our theory is definitely object-based in the sense that we're refining beliefs about specific objects based on the task. And I understand that probe detection will allow us to test what that task/objective is.

But it seems that our model does not, at least as it is currently implemented, say anything about probe detection because the belief refinement is done with respect to the whole observation set. Like we never process certain parts of the image more than others. Are we saying that attention is both selectively refining beliefs about certain objects (current implementation), but also selectively processing visual space regions where those important objects or object parts are?

belledon commented 4 years ago

this is a great question.

let me layout a few main assumptions we have been operating by

There is some impenetrable veil of perception where cognition or attention has not influence. In other words, with the gorilla experiment, there are, at least, retinal neurons responding to the physical stimulus even though the conceptual realization behind that stimuli is not captured in the percept. (Note that I often descibe this as a sedimentary layer at the bottom of feature representaiton but the veil need not be "flat")
This leads to a definition of finite signal resolution. There is some upper bound on what our perceptual modalities can ever process at the level of physical stimulus
While there are arguably alternative descriptions, going forward, we take the previous two assumptions and describe the sets of signals (the masks) are not in of themselves "manipulated" by high level perception or attention.

To consolidate, our current definition, our "masks" represent a pragmatic proposition of the impenetrable veil. Early visual cortex, in its pan-species ecological function, returns a chemical encoding of an optical stimulus where the chemical encoding can in turn be seen as a stimulus to higher level vision, the part of vision that leads to the phenomena of perception.

Assumption 3 is primarily why i was against "object-centric" masks in the first place as objecthood is an epsitemic notion that is often illposed at the level of visual features. For pragmatic reasons (especially since our stimuli are visually simple) it makes sense to create masks at the "dot" level but even more so when there are additional visual features (such as leading edges) that suggests congruency between graphical "objects" and epistemic / physical objects. (physical in the sense that the object exists in space/time)

In a more general model, where we had a more expressive causal model over graphics, attending to a particular physical object could lead to refinements over the intermediate representation of a collection of graphical elements. Here, we enforce a pragmatic parity which reduces attention to specific mask-to-object associations.

Now let me address particular statements with this framework in mind:

But it seems that our model does not, at least as it is currently implemented, say anything about probe detection because the belief refinement is done with respect to the whole observation set.

this is not true, the whole observation set is mechanically considered as part of the RFS likelihood but object-based attention only considers causal explanations that vary only the object axis.

Like we never process certain parts of the image more than others.

We do, the mask elements themselves are spatially distinct by construction. Thus by paying attention to a subset of objects we process only the part of the image where those masks are extracted. According to assumptions 1&2 the masks themselves represent this intrinsic output of early vision (note that masks do not need to be monolithic representation of early visual processing as we have discussed pixel-ensemble and higher dimensional masks in the past).

Are we saying that attention is both selectively refining beliefs about certain objects (current implementation), but also selectively processing visual space regions where those important objects or object parts are?

This is where I want to be careful with what we mean by attention, and I often designate the term "online perceptual attention". The historical connotation of "spatial attention" is highly contested to the point where Brian feels that most studies showing "spatial attention" report visual saliency (which are current framework can describe)...

But before I address the second statement above, I want to make a clarify a potential misunderstanding in the first section. You raise a distinction between beliefs over objects and processing visual space. While I agree that there is some distinction (hence assumption 1), I find this to be a logical fallacy. We are studying visual perception, namely how the mind deals with induction, through vision. In some sense, once we have designated some impenetrable veil (assumption 1) then all that follows along the path of inference is visual processing which includes a notion of "space"

Now a quick return to the distinction between online perceptual attention and spatial attention / visual saliency. Visual saliency, in our current framework, describes how an optical stimulus is parsed into masks, or quanta of evidence. Its possible that visual saliency, which is mostly a function of the optical properties of a stimulus rather than the context of a task, generates quanta that are not well explained with anything but the more veritable beliefs. On a isomorphic note, It is also possible that visual saliency serves as bottom up proposals that pushes the prior to "focus explanations" on that quanta of evidence over others.

In essence, spatial attention (or spot-light attention) doesnt exist

iyildirim commented 4 years ago

Thanks, @belledon for clearly articulating our assumptions. A lot of what attention does is to modulate activity in V4 especially in the context of object-like stimuli (see Desimone lab paper 2015 science) and it is not crazy to say that V4 is a lot like 2.5D properties.

However, note that there is nothing in these assumptions that precludes extending the model to actually perform the task of probe detection. Perhaps it would be a good exercise to consider how we can do so.

eivinasbutkus commented 4 years ago

Thanks @belledon, this has been helpful. It does seem like spatial attention is basically visual saliency in the literature.

So is this how probe detection happens, according to our framework:

We selectively refine beliefs about objects based on the task (as well as beliefs about graphical objects in the more general implementation).
In the process of that, certain regions of the impenetrable veil will be processed more than others in the sense that we'll be validating how the the (graphical) object would generate that region of the veil. (Hmm, maybe the idea here could also be that if you're doing Bayesian inference, you can compute the likelihood in a restricted region of the veil saving resources, given that you only care about certain objects?)
Finally, because of visual saliency, the probe is more readily detected in those regions. But how does visual saliency work exactly? Clearly, in testing object-based attention theories using probe detection, visual saliency is assumed to be mediated by object-based attention (or "online perceptual attention"). What is that mechanism? One idea could be that literally only a certain region of the veil is used in computing the likelihood and the brain always has the option of detecting salient objects automatically where it is computing the likelihood.

Thanks @iyildirim, I'll check out that paper.

belledon commented 4 years ago

so i'm with you about 85% here.

I would argue that visual saliency more effects the form of the veil. In other words, saliency is somewhat orthogonal to our assumptions around attention and probe detection. For us to have a valid experiment, all probes (in space time) have to be matched with saliency. This is an important control and means that no particular probe is perferably detected in the absence of attention. This is done by ensuring that all probes are "visually" identical. (here I am referencing the perceptual grouping literature)

Mechanistically, increased attention to an object leads to increased processing of some graphical intermediates around a subset of the visual signal. Thus if probe detection is uniform and low under the prior, then subjects should agree most (and hopefully have highest accuracy) around attended objects. We have discussed in the past how we are not literally modeling this posterior but instead generating it with our different models.

eivinasbutkus commented 4 years ago

That's cool, I think it makes sense.

Mechanistically, increased attention to an object leads to increased processing of some graphical intermediates around a subset of the visual signal.

I guess my final question would be regarding increased processing. In the literature every one is talking about attention leading to increased processing. But what is increased processing exactly? In our model, increased processing is basically refining the beliefs more. Is the idea that increased processing in graphical intermediates will mean that you're more likely to spawn a new object for the probe in your beliefs (as you're updating your beliefs more, i.e. searching through combinations of objects that explain these graphical intermediates)?

I don't think we need an actual model of probe detection. But as a thought exercise, if we were to model probe detection, a simple model would be to do rejuvenation moves over beliefs about objects that result in the particular mask (with low prior probability of there being a probe). The more rejuvenation moves you do, the more the posterior shifts to seeing a probe.

iyildirim commented 4 years ago

It'll be great to get a pilot off the ground for the probe detection experiment. Excitedly yours!

belledon commented 4 years ago

I don't think we need an actual model of probe detection. But as a thought exercise, if we were to model probe detection, a simple model would be to do rejuvenation moves over beliefs about objects that result in the particular mask (with low prior probability of there being a probe). The more rejuvenation moves you do, the more the posterior shifts to seeing a probe.

Precisely!

But what is increased processing exactly? In our model, increased processing is basically refining the beliefs more. Is the idea that increased processing in graphical intermediates will mean that you're more likely to spawn a new object for the probe in your beliefs (as you're updating your beliefs more, i.e. searching through combinations of objects that explain these graphical intermediates)?

Basically. In a more general model of perception, adding new graphical elements (ie a probe) to the scene would be expressed in the posterior

eivinasbutkus commented 4 years ago

Nice, this thread has been really helpful in getting the theoretical foundations! Let's figure out the practicalities now.

CNCLgithub / mot

Our Attention Framework and Probe Detection #14