CNCLgithub / mot

Model implementation for "Adaptive computation as a new mechanism of human attention"
0 stars 0 forks source link

stucture quantification in hierarchical MOT with polygons #55

Closed eivinasbutkus closed 3 years ago

eivinasbutkus commented 3 years ago

I'm making progress on structure quantification for the hierarchical polygon MOT. Last week, Mario and I came to the conclusion that we need to somehow quantify the "relevant structure", namely hierarchy that is relevant with respect to the task of tracking. These are some ideas for "structure":

  1. Negative log likelihood from the prior. The idea is to use the generative model to determine the likelihood of sampling some structure. I see two problems with this approach. First, we do not know what the human prior is (e.g. is one polygon more likely or three independent dots and by what degree). But even if we did somehow find out the human prior for structure, there's a further problem of equating structure with likelihood under the prior. It could be that something is relatively unlikely under the human prior, but still has a lot of structure. Like it may be that the prior for independent dots is quite high, but the structure is discovered only later on through the dynamics.
  2. Another idea is to use correlation of motion. So get velocity vectors by subtracting positions[t+1] - positions[t], compute correlation matrix and take the average. This is what I tried just now, but I realized that it doesn't capture angular velocity which is another way in which structure correlates the movement of objects in our scenario. We could record that when generating the dataset, but I think the next option is much more simple..
  3. In the end I decided to just try tying structure to the number of objects, namely to just say structure = 1/num_objects, where num_objects is the number of polygons + number of individual dots. E.g. if there are two polygons of sizes 4 and 3, and 1 individual dot, then structure = 1/3. This is super simple, but I think it captures the basic intuition that with less objects, there is more structure.

Now to capture the relevance of this structure, we can use what Mario's target concentration:

target_concentration = mean(n_targets/n_dots for each polygon that has a target)

The final formulation can then be:

relevant_structure = (alpha * structure + beta * target_concentration) / (alpha + beta)

For now we can set alpha=1 and beta=1.

Just wanted to share my thought process, any input is welcome!

belledon commented 3 years ago

3 was the one i was the most excited about as it will translate well to the classical MOT community.

oh and dont forget to divide the total by (alpha + beta)

iyildirim commented 3 years ago

Excellent! I like 3 as well.