Closed rbracco closed 1 year ago
You could probably approximate it by having the text for the file be "ðə waɪt dɒɡ" and then specifying your own dictionary as (I feel like automating this wouldn't be too much additional effort if you're creating the phone string of utterance text anyway, right?):
ðə ð ə
waɪt w aɪ t (you might want aj here instead of aɪ if you're using english_mfa model)
dɒɡ d ɒ ɡ
etc
or just a dictionary like
ð ð
ə ə
w w
aɪ aɪ
t t
d d
ɒ ɒ
ɡ ɡ
with texts of "ð ə w aɪ t d ɒ ɡ", but then you lose word information that you might want?
That's probably the easiest way, the integration with lexicons is a pretty deep assumption throughout the alignment code, though I have been playing around with other ways of generating the utterance graph that don't use it (instead using an integrated g2p model as the lexicon), so it should be doable, but I'd have to think about the best way to invoke that functionality.
Thank you, that clarifies things a lot. I'll try the escape hatches you suggested and reopen if there's any further questions.
Is your feature request related to a problem? Please describe. This feature may already exist, but I'm looking for a way to use a model I've pretrained to force-align against a specific string of phonemes, instead of looking up the phones in the dictionary using the English text. So if the sentence is "The white dog" instead of MFA looking up each word in the dictionary and then force aligning against "ðə waɪt dɒɡ", i'd rather be able to align against any phone string, including some that aren't actual words, examples: ðə waɪk dɒɡ, ðə waɪt doʊɡ ...etc
Describe the solution you'd like A way to do something like
mfa align AUDIO_FILE PHONE_LABEL ACOUSTIC_MODEL_PATH
, also it doesn't need to be exposed in the API at this top level, if I can go in to the code and do this manually somehow that would suffice.Describe alternatives you've considered I could make a very simple dictionary with a 1 to 1 mapping of the words to the phones I want, but this would be very tedious and not scalable in any way.