Why does phonemize run slower and slower when the dataset is large?

Hi eleven, thanks for your guide, and I am trying to prepare the vctk dataset with your phonemize scripts, and I found that the phonemize function get slower and slower. Can you figure out why? Thanks a lot.

Well it is working harder and harder the more data there is. One thing you may be running into is that whisperx has trouble creating an accurate .json. last night I updated the repo with a new script that will use the whisperx .srt file instead. The accuracy is considerably higher and it may be easier for the phonemizer to handle.

If you're still having issues you can try phonemizing in smaller portions.

IIEleven11 / StyleTTS2FineTune

Why does phonemize run slower and slower when the dataset is large? #3