what our framework offers in comparison to predictive processing, Vul et al. and Srivastava & Vul

eivinasbutkus commented 4 years ago

@iyildirim , you were asking what is it that we're offering that other attention accounts are not. Maybe this will be helpful, again would love to hear your thoughts! What are other accounts of attention besides the ones mentioned here?

Predictive processing: attention as precision optimization

One computational theory of attention is from the predictive processing (PP) camp. They say that attention modulates the precision parameters within a generative model. So, say you are walking in the mist -- then attention tunes your vision precision parameter down; if you're walking on a clear day -- attention tunes your vision precision parameter up. This could be called "attention as precision optimization".

There are several problems with this:

There are cases of low precision, but high attention. E.g. walking in a misty environment -- it requires a lot of attention even though the precision of the input signal is low. Similarly, looking at occluded faces. It seems that PP has it backwards saying that those are case of low attention.
It's difficult to incorporate reward/punishment and task-relevance within this theory of attention. The below paper makes the case that "affect-biased attention" cannot be incorporated within PP. An example of affect-biased attention is paying a lot of attention to something that you're afraid of. As in (1), you can have a case of low precision (say, you know that your perceptual system is unreliable), but you pay attention to, say, the backyard where you have seen a scary dog before. Overall, it seems that precision parameters and attention should be conceptually decoupled.
Doesn't explain why attention is effortful.

What we're saying instead is that attention operates over generative models with fixed parameters. The rational thing to do is to tune the parameters of the generative model as close as possible to the ground truth (e.g. low precision in a misty environment), but then for attention to allocate computational resources to refine the approximate posterior in a task-driven manner. So in a misty environment, we pay a lot of attention even though the precision parameters are tuned down because we do not want to run into something.

We can also explain affect-biased attention: the approximation of the posterior in those cases is driven by reward/punishment. If you're scared of something, attending to it will come out of reward sensitivity calculations.

You can read about PP's attention (section 1.3) as well as the problem from affect-biased attention in the paper below. Affect-based attention paper

Vul et al. 2009, Srivastava & Vul 2016

Vul et al. and Srivastava & Vul go a bit further and say that attention modulates the fidelity of measurements. I think this makes more sense than the PP theory of optimizing precisions. The problem with their account is that they don't really give a mechanistic explanation how that happens, how the fidelity is increased. It's kind of strange to think that you can get cleaner sensory signal by just wishing for it.. We instead give a mechanistic account.

Also, their account is very specific to MOT (e.g., how they define the confusability heuristic). We give a domain-general account of attention.

@belledon , as you were saying, we didn't bake in strange assumptions in our account, and now we're picking the fruit of that. But to be fair to Vul, Srivastava and others, their work was really influential in having a solid starting point for our own account.

Bonus of our account

As a weak point, in natural language we say that "we pay attention to something". In other accounts, it is not really clear what is being paid. In our account, we literally pay computational resources (the brain's analogy to floating point operations).

iyildirim commented 4 years ago

Thanks Eivinas. We need to think about this question for writing and for making sense of what we are accomplishing.

I didn't know about PP; it sounds interesting. In addition to it, there are a lot of models that go under the name "models of visual attention". I am not yet sure how and to extent what we should engage with this modeling literature in our writing: I found this entry and this special issue.

And it's very true -- the Vul et al. paper definitely did inspire our work. We are also inspired by Drugowitch et al. paper.

belledon commented 4 years ago

i think i missed a conversation about floating point numbers but I agree with what has been said.

I dont have anything personal against Vul but to be precise their direct contribution is (although I do think we came to some of these rather independently):

attention is a metacognitive process that allocates resources to perceptual processing in some task driven way

We've gone over this before and I'm not convinced that they really offered anything beyond a relatively summary consideration to the problem of attention. It is not really that novel to consider attention as metacognitive as Scholl 2001 already describes attention as the "glue" between perception and cognition. Also describing attention as rational also doesnt have much of a kick (its kind of like saying that evolution is global optimization) . Of course it is. what else could it be?

belledon commented 4 years ago

i'm not saying that you are underestimating it but to be precise

the fact that our model is general is everything. It comes from the theoretical pillars of our work (in my eyes)

What you know about objects / how you represent them (the generative model) describes a landscape of the posterior
What you are trying to do (the task) then defines in an abstract way, what could matter
Because perception is approximate (and not analytical) you have the marriage of the two previous. At the moment some subset of approximate beliefs matter (objectively and relatively) . In other words, approximation provides an interface for attention to affect perception.

The fact that Vul et al chose a Kelman filter at all tells me that were thinking of attention in a fundamentally different way.

iyildirim commented 4 years ago

You are reading way too much into their use of Kalman filter...

Note that Vul et al. (2009) is not a model of attention. They are pretty clear about that and they are doing identity tracking. As you are saying they do a rational analysis of the problem and end up saying you can understand some of the empirical findings without alluding to any capacity limitations. It is inspiring in that they thought of modeling MOT using sequential monte carlo. I remember when Josh presented that at Rochester as part of a MURI meeting, Robbie was so impressed; I was too. That's 2010. (Note that theirs is a collapsed or Rao-Blackwellized setting; using Kalman filter update for state estimation doesn't mean your entire inference is exact or analytical.)

I couldn't parse the sentence that started with "the fact..." but if you are saying something like our model is everything, I have no f*ing idea what that might mean. haha

In any case, a productive way of illustrating the model is to apply it over an ever growing set of domains and datasets. In this regard, I very much look forward to modeling events. I am also increasingly more excited about Galileo and emergent objects (we should talk about that sometime soon) and the individual differences project.

belledon commented 4 years ago

do you kiss your children good night with such a potty mouth?

and even I'm not that nutty =D

iyildirim commented 4 years ago

I am sorry -- I should have found a better word for emitting my amusement.

CNCLgithub / mot