YuanGongND / cav-mae

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
BSD 2-Clause "Simplified" License
223 stars 22 forks source link

Question about some irregular videos in AudioSet-20k #9

Closed mouxingyang closed 1 year ago

mouxingyang commented 1 year ago

Hi,

I tried to follow the finetuning protocol on AudioSet-20k, and I have downloaded ~18k training samples. However, I found that some videos are irregular according to the official '.csv' file, i.e., either less than 10s or the start-end time exceed the total length. Could you please tell me about how to preprocess these irregular videos?

Some examples are attached below, Tr7pmnO3eHo, 100.000, 110.000, "/m/03cl9h,/m/04rlf,/m/09x0r,/m/0ytgt" (The length of the video is only 2s in the Web) d7vfbyFl5kc, 0.000, 3.000, "/m/0c1dj,/t/dd00121" (The specified time bucket is less than 10s)

Thank you

YuanGongND commented 1 year ago

This should be a question to AudioSet authors.

either less than 10s

There are some audios shorter than 10s, see official: http://research.google.com/audioset/download.html.

or the start-end time exceed the total length.

This seems to be weird, but again you can check http://research.google.com/audioset/download.html to see if your meta data is consistent with the official.

If these are only few "irregular" samples, that shouldn't impact the performance a lot.

-Yuan

YuanGongND commented 1 year ago

Btw, there's an easy way to check Youtube video, the video id of AudioSet is the actual YouTube video id.

So you can randomly open a youtube video and replace the video id in the link with your (AudioSet) video id, so you can see the actual video length.

mouxingyang commented 1 year ago

Thanks for your quick reply.

Yeah, I do download the 'balanced_train_segments.csv' from the official. The so-called "irregular" samples are few but I wondered if there are some skills to preprocess them.

Thanks again for your reply and suggestions.

mouxingyang commented 1 year ago

Btw, there's an easy way to check Youtube video, the video id of AudioSet is the actual YouTube video id.

So you can randomly open a youtube video and replace the video id in the link with your (AudioSet) video id, so you can see the actual video length.

Thanks for your suggestions. But this means that I should re-calibrate the video's start-end time to meet the labels in the .csv file. I will try to omit these samples first, hope not hurt the performance too much.

Many thanks :)

YuanGongND commented 1 year ago

But this means that I should re-calibrate the video's start-end time to meet the labels in the .csv file.

Oh - do you misunderstand the meaning of the time stamp?

The timestamps are for the "full" youtube videos, not segmented ones, if you download the segments from somewhere, you should not use the timestamps. These are only for "full" Youtube videos.

More specifically, I checked https://www.youtube.com/watch?v=Tr7pmnO3eHo (the sample you give), it is actually 1:50 (110s), so 100-110s is valid.

-Yuan

mouxingyang commented 1 year ago

I am sorry for the wrong copy.

The videos are indeed crawled from Youtube. After download, I have cropped them according to the timestapes in official file (http://storage.googleapis.com/us_audioset/youtube_corpus/v1/csv/balanced_train_segments.csv).

The correct 'irregular' sample is (https://www.youtube.com/watch?v=d8WgfWSf1VM) whose length is 4:27 (267s) while the segment information is 'd8WgfWSf1VM, 280.000, 290.000, "/m/0dwt5"'.

Thanks for your kind reply

Best regards