Closed a-flying-crow closed 3 years ago
Sorry for the late response! I did not receive an email to remind me the issue.
It just means that individual audio event and visual event temporal boundaries. Here, the second-wise temporal boundary annotation is corresponding to first-wise video-level labels.
Does it means its tenporal boundaries might be shorter than the a-v boundary?