XinhaoMei / DCASE2021_task6_v2

Code for CVSSP submission to DCASE 2021 Task 6
35 stars 5 forks source link

About training with Audiocaps #4

Open JNaranjo-Alcazar opened 2 years ago

JNaranjo-Alcazar commented 2 years ago

Thanks again for the excellent work,

it is not clear to me how the settings.yaml should be set to perform the first step you indicate in your work. How do you train your framework with Audiocaps?

Thanks in advance

XinhaoMei commented 2 years ago

Hi, do you mean cross-entropy training for the first step? The default setting is using PANNs as encoder and a two-layer Transformer as decoder and training on Clotho. You can modify the parameters under encoder, decoder and training to change the training settings/ For the AudioCaps, training is the same as cross-entropy training with Clotho. But I have temporarily removed the part for training on AudioCaps, I am refactoring the code and will update it soon.

JNaranjo-Alcazar commented 2 years ago

Thanks for the quick answer! 😄 I was just asking about how to use AudioCaps instead of Clotho. I suppose that will be included in the next update 😃

Thanks again

XinhaoMei commented 2 years ago

You are welcome. By the way, the ACT used AudioCaps, and I uploaded the dataset in that repository. You can have a look at it. Thanks!