Open JNaranjo-Alcazar opened 2 years ago
Hi, do you mean cross-entropy training for the first step? The default setting is using PANNs as encoder and a two-layer Transformer as decoder and training on Clotho. You can modify the parameters under encoder, decoder and training to change the training settings/ For the AudioCaps, training is the same as cross-entropy training with Clotho. But I have temporarily removed the part for training on AudioCaps, I am refactoring the code and will update it soon.
Thanks for the quick answer! 😄 I was just asking about how to use AudioCaps instead of Clotho. I suppose that will be included in the next update 😃
Thanks again
Thanks again for the excellent work,
it is not clear to me how the
settings.yaml
should be set to perform the first step you indicate in your work. How do you train your framework with Audiocaps?Thanks in advance