RetroCirce / HTS-Audio-Transformer

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"
https://arxiv.org/abs/2202.00874
MIT License
341 stars 62 forks source link

Model Checkpoints #42

Closed wonyangcho closed 1 year ago

wonyangcho commented 1 year ago

Hello. I am a student studying various audio transformer papers, and I am impressed by your work. I have a few questions.

Thank you.

wonyangcho commented 1 year ago

I found the answer in the config.py file. Thank you.

Sreyan88 commented 1 year ago

Hi @wonyangcho , can you please share the answer in the config.py file. Would be grateful!

wonyangcho commented 1 year ago

@Sreyan88

I referred to the following part.

# trained from a checkpoint, or evaluate a single model
resume_checkpoint = None"
# "/home/Research/model_backup/AudioSet/HTSAT_AudioSet_Saved_1.ckpt"'

esm_model_pathes = [
    "/home/Research/model_backup/AudioSet/HTSAT_AudioSet_Saved_1.ckpt",
    "/home/Research/model_backup/AudioSet/HTSAT_AudioSet_Saved_2.ckpt",
    "/home/Research/model_backup/AudioSet/HTSAT_AudioSet_Saved_3.ckpt",
    "/home/Research/model_backup/AudioSet/HTSAT_AudioSet_Saved_4.ckpt",
    "/home/Research/model_backup/AudioSet/HTSAT_AudioSet_Saved_5.ckpt",
    "/home/Research/model_backup/AudioSet/HTSAT_AudioSet_Saved_6.ckpt"
]

This part seems to use 6 models in an ensemble.

Therefore, I judged that any model can be used as a pre-trained model without any problem."

Best regards.

Sreyan88 commented 1 year ago

Do you have any idea what the other folder experiments are pointing to? Thank You!

RetroCirce commented 1 year ago

The default checkpoint is in AudioSet folder. There are six available checkpoints and either of them can have a similar map reported in the paper (around 0.465-0.473).

The checkpoints in ESC and SCV2 folders are the fine-tuned checkpoints on ESC-50 and SCV2 datasets. The reported performance is on the paper.

The other setting folder shows some checkpoints we added later or before. Such as using/without using the imagenet checkpoint when training on Audioset. Or the model training on different sampling rate (such as 48000hz). You can try for that if you found them helpful. They actually also got very good performance.