YapengTian / AVVP-ECCV20

Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing, ECCV, 2020. (Spotlight)
79 stars 21 forks source link

what does the "second-wise temporal boundaries" mean when you mention the indivual audio and visual events in your datasets? #3

Closed a-flying-crow closed 3 years ago

a-flying-crow commented 4 years ago

Does it means its tenporal boundaries might be shorter than the a-v boundary?

YapengTian commented 4 years ago

Sorry for the late response! I did not receive an email to remind me the issue.

It just means that individual audio event and visual event temporal boundaries. Here, the second-wise temporal boundary annotation is corresponding to first-wise video-level labels.