YuanGongND / ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
1.07k stars 205 forks source link

Running AST on a downstream task. #6

Closed saifkhan-m closed 2 years ago

saifkhan-m commented 2 years ago

Dear Yuan,

Thank you for creating this SOTA model for audio processing.

I want to run AST on an Audio dataset. I have prepared the data in a similar manner as the data prepared for ESC50 dataset. I wanted to run the model but then I noticed that you took dataset specific mean and std to normalize the dataset. Can you please share the method you used to find these two metrics.

Regards Saif

junxiant commented 2 years ago

Dear Yuan,

Thank you for creating this SOTA model for audio processing.

I want to run AST on an Audio dataset. I have prepared the data in a similar manner as the data prepared for ESC50 dataset. I wanted to run the model but then I noticed that you took dataset specific mean and std to normalize the dataset. Can you please share the method you used to find these two metrics.

Regards Saif

Were you able to run the model on a downstream task? If so could you share it please?

YuanGongND commented 2 years ago

Hi there,

I am quite sure that the AST model should work for downstream tasks - as we tested the AudioSet pretrained model on SpeechCommands task and ESC-50 task in the paper and both achieved SOTA. But there is something that needs to take care of, e.g., normalization, learning rate, learning rate scheduler, specaug parameters, etc.

I am busy working on something else now but hopefully I can provide guidance on how to do that in a few days.

Best, Yuan

YuanGongND commented 2 years ago

Hi there,

I have uploaded the sample code of how to compute the dataset mean and std. Please see ast/src/get_norm_stats.py, it is very simple.

I also add a section in the readme file on how to use pretrained model for downstream tasks, please take a look if you are interested.

-Yuan

saifkhan-m commented 2 years ago

Hi Yuan,

Thank you for the prompt action to process the data for downstream tasks.

I followed the procedure mentioned in the Readme and I am able to run the AST on small dummy dataset. I will update the results and the final details here, once I train the actual dataset.

While processing the normalization stats, I was getting some minor issues. I have wrinkled them out and created this pull request for the same. I hope with this addition, anyone can use AST for downstream tasks.

Regards, Saif

YuanGongND commented 2 years ago

Hi Saif,

Thanks for fixing the issue. I have merged that to the main branch.

-Yuan

jvel07 commented 2 years ago

Hi @saifkhan-m! Did you prepare the data in the same way as the, e.g., ESC-50? For instance, did you already have a file similar to "esc50.csv" for your custom dataset?

saifkhan-m commented 2 years ago

Hey @jvel07,

Yes, I prepared the data in a similar way like Yuan. I did some changes based on my data. You can look at the relevant changes to the files here. Hope it helps.

Regards Saif