I'm making progress on structure quantification for the hierarchical polygon MOT. Last week, Mario and I came to the conclusion that we need to somehow quantify the "relevant structure", namely hierarchy that is relevant with respect to the task of tracking. These are some ideas for "structure":
Negative log likelihood from the prior. The idea is to use the generative model to determine the likelihood of sampling some structure. I see two problems with this approach. First, we do not know what the human prior is (e.g. is one polygon more likely or three independent dots and by what degree). But even if we did somehow find out the human prior for structure, there's a further problem of equating structure with likelihood under the prior. It could be that something is relatively unlikely under the human prior, but still has a lot of structure. Like it may be that the prior for independent dots is quite high, but the structure is discovered only later on through the dynamics.
Another idea is to use correlation of motion. So get velocity vectors by subtracting positions[t+1] - positions[t], compute correlation matrix and take the average. This is what I tried just now, but I realized that it doesn't capture angular velocity which is another way in which structure correlates the movement of objects in our scenario. We could record that when generating the dataset, but I think the next option is much more simple..
In the end I decided to just try tying structure to the number of objects, namely to just say structure = 1/num_objects, where num_objects is the number of polygons + number of individual dots. E.g. if there are two polygons of sizes 4 and 3, and 1 individual dot, then structure = 1/3. This is super simple, but I think it captures the basic intuition that with less objects, there is more structure.
Now to capture the relevance of this structure, we can use what Mario's target concentration:
target_concentration = mean(n_targets/n_dots for each polygon that has a target)
I'm making progress on structure quantification for the hierarchical polygon MOT. Last week, Mario and I came to the conclusion that we need to somehow quantify the "relevant structure", namely hierarchy that is relevant with respect to the task of tracking. These are some ideas for "structure":
positions[t+1] - positions[t]
, compute correlation matrix and take the average. This is what I tried just now, but I realized that it doesn't capture angular velocity which is another way in which structure correlates the movement of objects in our scenario. We could record that when generating the dataset, but I think the next option is much more simple..structure = 1/num_objects
, wherenum_objects
is the number of polygons + number of individual dots. E.g. if there are two polygons of sizes 4 and 3, and 1 individual dot, thenstructure = 1/3
. This is super simple, but I think it captures the basic intuition that with less objects, there is more structure.Now to capture the relevance of this structure, we can use what Mario's target concentration:
The final formulation can then be:
For now we can set
alpha=1
andbeta=1
.Just wanted to share my thought process, any input is welcome!