Question about AAC Train

zhangron013 commented 2 months ago

Dear Author, hi! Thank you for providing such a wonderful job! I have tried using this framework to replace HTSAT with other Audio Encoders such as BEATs for experiments. Similarly, I used the pre-trained BEATs and froze it. I did not change the training hyperparameters, such as lr=1e-4. However, after ten rounds of training, the output results were many identical and meaningless captions. I wonder if this is due to incorrect settings of the learning rate or other hyperparameters?

我尝试使用这个框架将HTSAT替换成其余的Audio Encoder做实验，例如BEATs。同样的，我使用的预训练的BEATs并将其冻结，我没有改变训练的超参数，例如lr=1e-4。训练十轮后的输出结果却是很多相同且无意义的caption，请问这是因为学习率或其他超参数设置不正确导致的吗？

XinhaoMei commented 1 month ago

Sorry for my late reply.

Yes, you could try different hyper-parameters and also find-tune the audio encoder.

zhangron013 commented 1 month ago

thanks~

zhangron013 commented 1 month ago

By the way, when I unfreeze the HTSAT to finetune , I find that as the epoch increases, the cider metrics is decreasing, but the loss is decreasing. For example, epoch 1: cider=0.2289, epoch 11: cider=0.1166. And in epoch 19, lots of the generated captions become meaningless, like:

'a sound is being recorded' or 'an object is playing a sound'

I think this may be overfitting. Have you encountered this with previous training? The following is my experimental configuration: lr=5e-5 train datasets: the train part of Audiocaps, Clotho and Wavcaps test datasets: the eval part of Clotho load model: pretrained HTSAT and pretrained BART

当我将HTSAT解冻与整个模型一起微调时，我将学习率设为5e-5, 数据集是Audiocaps，Clotho，Wavcaps的训练集部分，我发现随之轮数增加，cider的指标在不断下降，但是loss却一直下降。比如：第一轮cider=0.2289，但第11轮就降到了0.1166.并且在第19轮时，生成了很多毫无意义的caption，例如：' a sound is being recorded', 'an object is playing a sound'。我感觉好像是过拟合了，请问您之前训练的时候遇到过这种情况吗？我的实验配置如下：

lr:5e-5
训练集： Audiocaps, Clotho 和 Wavcaps的训练集部分
测试集：Clotho的测试集部分
加载的是您github主页发布的预训练的HTSAT和BART的权重

XinhaoMei commented 1 month ago

For pretraining on whole data, I didn't encounter this issue. When finetune on AudioCaps, overfitting usually occurs.

And looks like the cider score in your experiments is also a bit low.

zhangron013 commented 1 month ago

For pretraining on whole data, I didn't encounter this issue. When finetune on AudioCaps, overfitting usually occurs.

And looks like the cider score in your experiments is also a bit low.

ok~ thanks for you reply!

XinhaoMei / WavCaps

Question about AAC Train #29