RetroCirce / HTS-Audio-Transformer

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"
https://arxiv.org/abs/2202.00874
MIT License
344 stars 62 forks source link

The mAP following Audioset Recipe is very low #3

Closed kimsojeong1225 closed 2 years ago

kimsojeong1225 commented 2 years ago

Hi, I downloaded model checkpoint files that you provided through Google drive and followed README Audios Evaluate code. Test: CUDA_VISIBLE_DEVICES=1,2,3,4 python main.py test I expected to get similar performance to your paper But I got very low mAP. A number of eval dataset I used are 18,887. I would like to know the your data set size if possible. Attached Single model evaluation(HTSAT_AudioSet_Saved_1.ckpt) results pic. Thanks.

kimsojeong1225 commented 2 years ago

Screenshot from 2022-03-10 21-34-46

RetroCirce commented 2 years ago

Would you mind sending me your config.py file to let me know the possible error, below is my result on HTSAT_SAVE1.ckpt, my eval dataset is the same 18887. image

kimsojeong1225 commented 2 years ago

When I send you email, this error happend 550 5.7.0 Illegal Attachment - Policy Violation So I attach my config file in here. Thanks a lot for your time. config.txt

RetroCirce commented 2 years ago

may I also take a look at your sed_.model.py?

kimsojeong1225 commented 2 years ago

sed_model.txt Here! Thanks...!

RetroCirce commented 2 years ago

Thanks, and what is the sample rate of your download audioset? 16000 or 32000

kimsojeong1225 commented 2 years ago

audioset sample rate is 32000

RetroCirce commented 2 years ago

Try to use this dataset link as a test, this is what I used, following the repo from PANN: https://drive.google.com/file/d/1E6vE02OvWSes9Uy91vNzRlCYB8sQVfC4/view?usp=sharing

The audioset contains various of sample rate, including 32000, 16000, 22000, since their audios are downloaded from youtube, and the resample algorithm may cause the data processing difference. And my model is trained with my processed data, maybe there are some differences.

RetroCirce commented 2 years ago

and this index link: https://drive.google.com/file/d/1PJRun9nVqXC6yRrR77ZzvxzIvJpd5qSm/view?usp=sharing

kimsojeong1225 commented 2 years ago

I tried again with your data and got same results on the paper! Thanks for the good research and kind help!

RetroCirce commented 2 years ago

So it must be the converting difference, I notice that in your last issue, you post a training figure which shows an map with 0.260 in the beginning of the training, I'm sure if you use your data for training, and the model will achieve a similar result in the paper when it converges.