XinhaoMei / WavCaps

This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
196 stars 11 forks source link

Question about AAC Train #29

Closed zhangron013 closed 1 month ago

zhangron013 commented 2 months ago

Dear Author, hi! Thank you for providing such a wonderful job! I have tried using this framework to replace HTSAT with other Audio Encoders such as BEATs for experiments. Similarly, I used the pre-trained BEATs and froze it. I did not change the training hyperparameters, such as lr=1e-4. However, after ten rounds of training, the output results were many identical and meaningless captions. I wonder if this is due to incorrect settings of the learning rate or other hyperparameters?

我尝试使用这个框架将HTSAT替换成其余的Audio Encoder做实验,例如BEATs。同样的,我使用的预训练的BEATs并将其冻结,我没有改变训练的超参数,例如lr=1e-4。训练十轮后的输出结果却是很多相同且无意义的caption,请问这是因为学习率或其他超参数设置不正确导致的吗?

image

XinhaoMei commented 1 month ago

Sorry for my late reply.

Yes, you could try different hyper-parameters and also find-tune the audio encoder.

zhangron013 commented 1 month ago

thanks~

zhangron013 commented 1 month ago

By the way, when I unfreeze the HTSAT to finetune , I find that as the epoch increases, the cider metrics is decreasing, but the loss is decreasing. For example, epoch 1: cider=0.2289, epoch 11: cider=0.1166. And in epoch 19, lots of the generated captions become meaningless, like:

'a sound is being recorded' or 'an object is playing a sound'

I think this may be overfitting. Have you encountered this with previous training? The following is my experimental configuration: lr=5e-5 train datasets: the train part of Audiocaps, Clotho and Wavcaps test datasets: the eval part of Clotho load model: pretrained HTSAT and pretrained BART

当我将HTSAT解冻与整个模型一起微调时,我将学习率设为5e-5, 数据集是Audiocaps,Clotho,Wavcaps的训练集部分,我发现随之轮数增加,cider的指标在不断下降,但是loss却一直下降。比如:第一轮cider=0.2289,但第11轮就降到了0.1166.并且在第19轮时,生成了很多毫无意义的caption,例如:' a sound is being recorded', 'an object is playing a sound'。我感觉好像是过拟合了,请问您之前训练的时候遇到过这种情况吗?我的实验配置如下:

XinhaoMei commented 1 month ago

For pretraining on whole data, I didn't encounter this issue. When finetune on AudioCaps, overfitting usually occurs.

And looks like the cider score in your experiments is also a bit low.

zhangron013 commented 1 month ago

For pretraining on whole data, I didn't encounter this issue. When finetune on AudioCaps, overfitting usually occurs.

And looks like the cider score in your experiments is also a bit low.

ok~ thanks for you reply!