facebookresearch / spiritlm

Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
Other
635 stars 40 forks source link

About the pitch extractor #10

Closed hayeong0 closed 1 day ago

hayeong0 commented 4 days ago

Thank you for sharing your nice work! 🔥

I have a question regarding pitch tokens. In this paper, I noticed that you used pyannote for training the pitch quantizer and FCPE for training the language model. Besides the inference speed discussed in the paper, is there any other reason for choosing these specific F0 extractors? I also saw that the public code includes both pYAAPT and FCPE extractors. From my experience, pYAAPT is slow but robust. Could you share your experience with this as well?