RicherMans / SAT

Streaming Audiotransformers for online Audio tagging
GNU General Public License v3.0
41 stars 4 forks source link

nan mAP when training #2

Closed SteveTanggithub closed 1 year ago

SteveTanggithub commented 1 year ago

Sry for bothering u again. i have run the MAE pretraining and SAT training like u said in README, but i got "nan" result of mAP like below. My balanced train data is about 12000 audios and eval data is about 7000 audios. Did i miss something important settings? Or something maybe go wrong? Like i should set the pretrain "pt" checkpoint i got? image

RicherMans commented 1 year ago

Hey, can you provide some more information about the issue, like what config are you using and what model is the teacher?

Did MAE pretraining work ? Did you use your pretrained mAE to initialize the model?

I would guess from your log is that the mAP is NaN because of your data, if there are some labels not being present.

SteveTanggithub commented 1 year ago

i just use the mae_tiny.yaml for pretrain MAE and balanced_sat_2_2s.yaml for SAT training. i got pretrained pt like below. image do i need change the pretrained MAE path below? image And i provide pretrain log and train log: SAT_train.log

pretrain.log About the data, here are label csvs: eval.csv balanced.csv Noticed that the numbers of labels in label csv is different from that in audio folders. (12794 of balanced audios and 7749 of eval audios)

RicherMans commented 1 year ago

Hey, first:

i just use the mae_tiny.yaml for pretrain MAE and balanced_sat_2_2s.yaml for SAT training. i got pretrained pt like below.

So your loss on these checkpoints seems to be very high, I guess your data is not 16k sampling rate or has some other problems, please check that.

do i need change the pretrained MAE path below?

I actually suggest you to use my checkpoint, since it was trained on the entire audioset.

And i provide pretrain log and train log:

Your logs seem alright, but I am strongly assuming that your data is incorrect. Please check that all samples are PCM int16 bit with a sampling rate of 16 000 Hz. Further, please check that all samples are 10s long.

Regards, Heinrich

SteveTanggithub commented 1 year ago

The mAP is still nan while i have resampled all my wav data to 16k and made them 10s. The inference .py file can be run normally. BTW, I use ur pretrained pt file u provided. Even i use just 12 train wavs and 4 eval wavs, the mAP is still the same. Could it be a data or label issue or a training process setup issue? I am really confused :(

RicherMans commented 1 year ago

Can you check that definately the data is correct? Otherwise, maybe checkout kqq's audioset (which you can directly download): https://pan.baidu.com/s/13WnzI1XDSvqXZQTS-Kqujg, password: 0vc2

But just saying, the mAP can be nAn when some of the 527 labels do not have samples during training/evaluation. Can you check that in your training/eval set, that all 527 labels are present at least once?

Kind regards

SteveTanggithub commented 1 year ago

Thank you so much. I will try your advice:)

SteveTanggithub commented 1 year ago

Hi, Heinrich, I have downloaded the balanced and eval data for the dataset u provided. But the audio file names in csv files downloaded from the .sh code are different from the ones I downloaded now. Could you provide the new csvs files including the balanced, eval data and label csvs?

RicherMans commented 1 year ago

Hey there, nope that you can do yourself. Each youtube-id is usually unique up until the 11th char, so its just something like: line.split('/')[-1][:11] and then match both datasets over this key.

SteveTanggithub commented 1 year ago

hello, I load eval data by changing "dataset.UnlabeledHDF5Dataset" to "dataset.WeakRandomCropHDF5Dataset", and the mAP is shown below. I have two questions now:

  1. why do u load eval dataset labels by dataset.UnlabeledHDF5Dataset? The labels will be all zeros in that situation. Or am I getting it wrong?
  2. why the mAP in the log below is getting lower and lower when training?

[INFO 2023-07-11 16:31:11] Got 51640 train samples and 18887 validation ones. [INFO 2023-07-11 16:31:11] Using warmup with 187500 iters [INFO 2023-07-11 16:46:28] Validation Results - Epoch : 1 mAP 0.4321 LR: 8.33e-06 [INFO 2023-07-11 17:03:19] Validation Results - Epoch : 2 mAP 0.4281 LR: 1.67e-05 [INFO 2023-07-11 17:18:52] Validation Results - Epoch : 3 mAP 0.4252 LR: 2.50e-05 [INFO 2023-07-11 17:34:29] Validation Results - Epoch : 4 mAP 0.4212 LR: 3.33e-05 [INFO 2023-07-11 17:51:10] Validation Results - Epoch : 5 mAP 0.4177 LR: 4.17e-05 [INFO 2023-07-11 18:07:12] Validation Results - Epoch : 6 mAP 0.4172 LR: 5.00e-05 [INFO 2023-07-11 18:22:29] Validation Results - Epoch : 7 mAP 0.4159 LR: 5.83e-05 [INFO 2023-07-11 18:37:44] Validation Results - Epoch : 8 mAP 0.4126 LR: 6.67e-05 [INFO 2023-07-11 18:52:23] Validation Results - Epoch : 9 mAP 0.4100 LR: 7.50e-05 [INFO 2023-07-11 19:08:41] Validation Results - Epoch : 10 mAP 0.4083 LR: 8.33e-05 [INFO 2023-07-11 19:25:13] Validation Results - Epoch : 11 mAP 0.4058 LR: 9.17e-05 [INFO 2023-07-11 19:41:38] Validation Results - Epoch : 12 mAP 0.4038 LR: 1.00e-04 [INFO 2023-07-11 19:57:16] Validation Results - Epoch : 13 mAP 0.4012 LR: 1.08e-04 [INFO 2023-07-11 20:11:52] Validation Results - Epoch : 14 mAP 0.3998 LR: 1.17e-04 [INFO 2023-07-11 20:28:29] Validation Results - Epoch : 15 mAP 0.3960 LR: 1.25e-04 [INFO 2023-07-11 20:44:10] Validation Results - Epoch : 16 mAP 0.3961 LR: 1.33e-04 [INFO 2023-07-11 20:59:45] Validation Results - Epoch : 17 mAP 0.3955 LR: 1.42e-04 [INFO 2023-07-11 21:15:16] Validation Results - Epoch : 18 mAP 0.3931 LR: 1.50e-04 [INFO 2023-07-11 21:30:53] Validation Results - Epoch : 19 mAP 0.3913 LR: 1.58e-04 [INFO 2023-07-11 21:48:55] Validation Results - Epoch : 20 mAP 0.3886 LR: 1.67e-04 [INFO 2023-07-11 22:04:37] Validation Results - Epoch : 21 mAP 0.3869 LR: 1.75e-04 [INFO 2023-07-11 22:20:20] Validation Results - Epoch : 22 mAP 0.3877 LR: 1.83e-04 [INFO 2023-07-11 22:36:16] Validation Results - Epoch : 23 mAP 0.3861 LR: 1.92e-04 [INFO 2023-07-11 22:49:53] Validation Results - Epoch : 24 mAP 0.3858 LR: 2.00e-04 [INFO 2023-07-11 23:06:27] Validation Results - Epoch : 25 mAP 0.3840 LR: 2.08e-04 [INFO 2023-07-11 23:21:50] Validation Results - Epoch : 26 mAP 0.3834 LR: 2.17e-04 [INFO 2023-07-11 23:38:31] Validation Results - Epoch : 27 mAP 0.3827 LR: 2.25e-04 [INFO 2023-07-11 23:56:24] Validation Results - Epoch : 28 mAP 0.3809 LR: 2.33e-04 [INFO 2023-07-12 00:12:08] Validation Results - Epoch : 29 mAP 0.3794 LR: 2.42e-04 [INFO 2023-07-12 00:27:40] Validation Results - Epoch : 30 mAP 0.3808 LR: 2.50e-04 [INFO 2023-07-12 00:42:19] Validation Results - Epoch : 31 mAP 0.3789 LR: 2.58e-04 [INFO 2023-07-12 00:57:50] Validation Results - Epoch : 32 mAP 0.3805 LR: 2.67e-04 [INFO 2023-07-12 01:14:01] Validation Results - Epoch : 33 mAP 0.3796 LR: 2.75e-04 [INFO 2023-07-12 01:32:27] Validation Results - Epoch : 34 mAP 0.3775 LR: 2.83e-04 [INFO 2023-07-12 01:47:32] Validation Results - Epoch : 35 mAP 0.3783 LR: 2.92e-04 [INFO 2023-07-12 02:03:18] Validation Results - Epoch : 36 mAP 0.3767 LR: 3.00e-04 [INFO 2023-07-12 02:18:49] Validation Results - Epoch : 37 mAP 0.3764 LR: 3.08e-04

RicherMans commented 1 year ago

Hey there,

hello, I load eval data by changing "dataset.UnlabeledHDF5Dataset" to "dataset.hello, I load eval data by changing "dataset.UnlabeledHDF5Dataset" to "dataset.WeakRandomCropHDF5Dataset", and the mAP is shown below.", and the mAP is shown below.

Thanks a lot for noticing, I am also confused why I wrote that part of the code... I submit a small commit to change that. The correct dataloader would be WeakHDF5Dataset, since you don't want random crops during evaluation.

why the mAP in the log below is getting lower and lower when training?

It seems to me that you used a finetuned model from audioset and continued training on the balanced dataset.

I provide two checkpoints in this repo:

  1. MAE checkpoints to finetune your model
  2. Full-AS finetuned checkpoints that were trained on the full ~2Million samples Audioset.

You mAP decreases most likely because you only use the balanced subset. This repo was actually only intended for finetuning on the balanced subset, since I generally don't like to get involved with the full training set in public repos, due to problems of downloading the data, problems of storing the data and other preprocessing that I don't want to get involved with :D.

But so far looks all good to me! Thanks for the issue!