YuanGongND / ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
1.07k stars 205 forks source link

Where reflected the transformer or attention in ATSModel? #10

Closed huacilang closed 2 years ago

huacilang commented 2 years ago

hello , I didn't find the transformer or attention in the ATSModel ,Can you help me point out ?

YuanGongND commented 2 years ago

Hi there,

The Transformer is created by the timm package. https://github.com/YuanGongND/ast/blob/d338ce48b4861e419ee62c9ecad499cfd548e54b/src/models/ast_models.py#L67

Specifically, the Transformer is in ast_mdl.v.blocks. https://github.com/YuanGongND/ast/blob/d338ce48b4861e419ee62c9ecad499cfd548e54b/src/models/ast_models.py#L176

-Yuan

huacilang commented 2 years ago

thanks , i need to learn more about timm 。 another question, sigmoid activation is mentioned in the paper,but i still didn't find it in the code ,Can you help me point out ?

YuanGongND commented 2 years ago

We use torch.nn.BCEWithLogitsLoss, which contains Sigmoid. https://github.com/YuanGongND/ast/blob/d338ce48b4861e419ee62c9ecad499cfd548e54b/src/traintest.py#L65

-Yuan

huacilang commented 2 years ago

ok,thanks i have finished training on Speech Commands and the result is higher than the paper said,Can you help me take a look at is there any problem? parameters are as follows: model_size=base384,epoch=10,lr=2.5e-4,batch-size=128 and result are as follows: ---------------evaluate on the validation set--------------- Accuracy: 0.972748 AUC: 0.999579 ---------------the evaluation dataloader--------------- now using following mask: 0 freq, 0 time now using mix-up with rate 0.000000 now process speechcommands use dataset mean -6.846 and std 5.565 to normalize the input number of classes is 35 ---------------evaluate on the test set--------------- Accuracy: 0.974739 AUC: 0.999688

huacilang commented 2 years ago

so ,what result should i concern ,AUC or Accuracy?

YuanGongND commented 2 years ago

You should look at the accuracy, so the number you get is lower than the paper (~0.981). There are many things that could impact the performance, but the most obvious thing I noticed is that your epoch (10) is smaller than what we used in the recipe (30). Could you try to use the exact same hyper-parameters and have another try? Thanks.

We have included our training log, so you can easily compare the performance of each epoch with us (the log shows that at the 10th epoch our validation acc is also ~0.974, so it seems your setting is correct but just need more training).

Finally, it is normal that you get slightly better/worse numbers than that is reported in the paper because 1) there's some noise in training, so the result differs with random seeds, we report mean/std in the paper so you might get a number higher/lower than the mean value; 2) I did some minor modifications that could lead to a slightly better number.

-Yuan

YuanGongND commented 2 years ago

Btw, I would appreciate it if you could bring up new issues for new questions because other people can easier find the issue. Thanks!

huacilang commented 2 years ago

ok ,Thank you a lot for spending so much time to answer questions for me, I am much clearer now, thanks again

YuanGongND commented 2 years ago

You are welcome, please let me know if you can get the accuracy reported in the paper. Thanks!

huacilang commented 2 years ago

hello,i get the Accuracy: 0.981281,haha....,thanks again

YuanGongND commented 2 years ago

Thanks so much for letting me know. Yes, it is exactly the same as what we got in our log...