MichiganCOG / A2CL-PT

Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization (ECCV 2020)
MIT License
46 stars 8 forks source link

Usage of Beta #3

Closed arnavc1712 closed 2 years ago

arnavc1712 commented 4 years ago

Hi, I did not quite understand how multiplying the original TCAM by a scalar Beta is representing the background feature ? When I debugged it I found out that it is assigning approximately 1/num_clips to each clip as part of the video.

For example if the softmax TCAM for a certain class in 10 number of steps is [0.0839, 0.1689, 0.1689, 0.0767, 0.0798, 0.0798, 0.1025, 0.0767, 0.0798, 0.0831], it's softmax TCAM after multiplying it by beta=0.01 becomes [0.0999, 0.1006, 0.1006, 0.0998, 0.0998, 0.0998, 0.1001, 0.0998, 0.0998, 0.0999].

This is basically assigning equal scores to each clip.

kylemin commented 4 years ago

Hi, Thank you for your interest. Let's say that beta is 0. Then, the new attention (Eq. (6) of the paper) will be constant over time, so it is supposed to have lower values for the activity features when compared to the original attention (Eq. (2) of the paper). In other words, it is supposed to have higher values for the background features when compared to the original attention. This statement holds for any beta that is lower than 1. We found that randomly generating beta from [0.001, 0.1] for each training sample produces a good performance. Of course, higher beta makes the triplet (Eq. (7)) harder (like hard example mining), which might provide a better training signal... but we did not confirm this. More experiments are needed to validate it! Kyle