XinhaoMei / WavCaps

This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
194 stars 11 forks source link

issue for automatic audio captioning #20

Closed Liwen4567 closed 10 months ago

Liwen4567 commented 11 months ago

Dear author, hello! I am trying to use your model for automatic audio captioning, but I got some questions I was used train.py to train 10 epochs from scratch, but I found that CNN14-BART could generate captioning normally, while HTSAT-BART could only generate repeated captioning that had nothing to do with real captioning. May I ask why?

Liwen4567 commented 11 months ago

The captions generated by HTAST-BART are basically repetitive and have nothing to do with the actual captions。I didn't change the schema or hyperparameters of any of the models. image

XinhaoMei commented 11 months ago

Hello,

If you train the model from scratch, I guess you might need to tune the hyperparameters (i.e., learning rate) carefully.

Liwen4567 commented 11 months ago

Thank you for your reply! I have been able to generate normal captions by importing your pre-trained weights, but if I try to start over a lot of times, each time, there was no way to generate normal captions. I only used the Clotho. As you said, the hyperparameters need to be carefully adjusted. Does this make a difference?

XinhaoMei commented 11 months ago

Of course, training from scratch and fine-tuning usually requires different learning rates.

Liwen4567 commented 11 months ago

Thanks! I will try it by different learning rates.

zhangron013 commented 1 month ago

Thanks! I will try it by different learning rates.

hi, have you tried other learning rates? What learning rate is required for HTSAT to generate captions normally?