I have a question regarding pitch tokens.
In this paper, I noticed that you used pyannote for training the pitch quantizer and FCPE for training the language model.
Besides the inference speed discussed in the paper, is there any other reason for choosing these specific F0 extractors?
I also saw that the public code includes both pYAAPT and FCPE extractors.
From my experience, pYAAPT is slow but robust. Could you share your experience with this as well?
Thank you for sharing your nice work! 🔥
I have a question regarding pitch tokens. In this paper, I noticed that you used pyannote for training the pitch quantizer and FCPE for training the language model. Besides the inference speed discussed in the paper, is there any other reason for choosing these specific F0 extractors? I also saw that the public code includes both pYAAPT and FCPE extractors. From my experience, pYAAPT is slow but robust. Could you share your experience with this as well?