anandaswarup / phrase_break_prediction

Scripts for training a phrase break prediction system
MIT License
6 stars 2 forks source link

feature request? #5

Open jjsmcneil1113 opened 5 months ago

jjsmcneil1113 commented 5 months ago

Thank you so much for publishing your code. It's been useful in my attempt for tortoise TTS. You referenced Futamata et al. "Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis". Futamata apparently combines the BiLSTM and BERT features to finally predict phrase breaks ("Explicit features from BiLSTM and implicit features from BERT are finally concatenated, and then the method determines whether a phrase break should be labeled after each token.")

Any hint on how to do this in your respository? Thanks so much!

anandaswarup commented 5 months ago

I did not implement combining BiLSTM and BERT features for phrase break prediction. I perform phrase break prediction using BiSTM and BERT features separately and compare the two.

One issue that I see in combining BiLSTM and BERT features, is that the tokenisation is different in both cases. In the case of BiLSTM the learnt features (embedding) are word embeddings i.e. the tokenisation of the input text is at the word level (whitespace tokenisation). In the case of BERT, I use the standard BERT tokeniser from HuggingFace, which tokenizes text at the sub-word level.

If you are interested we can collaborate to implement this features. Please let me know if you are interested in the same.

jjsmcneil1113 commented 5 months ago

Thank you very much. I would absolutely love to collaborate with you, although I have to be honest, that I have a computer science degree from long ago, and am a beginner in machine learning. I would be very interested in working with you as best I can. I am a quick learner! ;)