NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
854 stars 187 forks source link

How to fix problematic words with MusicXML parser? #28

Open AndroYD84 opened 4 years ago

AndroYD84 commented 4 years ago

I think the MusicXML parser is the most interesting feature from this repo, however, I would never had imagined that a super common word such as "Hello" or "Hel-lo" could throw an error, especially when "hi/baby/ba-by/shark/do" work just fine, I assumed that was a "must work" word that I could safely use for testing purposes, but instead turned out to be the culprit of a long series of problems I had with some MusicXML file where I couldn't figure out what could possibly be wrong with them as all words existed in the arpabet dictionary, to narrow down the problem I kept dumbing down the lyrics until they all contained only the word "hello" and since it still wasn't working, I assumed there must have been some hidden problem as my past experience taught me that even something as simple as one orphaned note, a typo or an elusive invisible character is all it takes to throw that same error message here. Could you please explain more in detail what steps are required to fix such bugs (ie. using the word "Hello") so other users can follow your example as well and narrow down all the buggy words and fix them as they keep coming up? I'd really appreciate that, thanks!

rafaelvalle commented 4 years ago

Thanks for reporting the issue. Please pull from master and try again. Issues are probably related to the PHONEME2GRAPHEME map and the events2eventsarpabet method.

rafaelvalle commented 4 years ago

@AndroYD84 has this issue been resolved ?

AndroYD84 commented 4 years ago

Yes, the word "Hello" now works fine after the fix, however, there're still plenty of common words that will trigger this error, perhaps a solution would be a script that automatically tries all words from the CMU dictionary on a dummy/simple MusicXML file, one per time, so when this error is triggered, it can log specifically which word caused it, and in the end there would be a complete list of all buggy words.

AndroYD84 commented 4 years ago

I found a bunch more words that will always output an error if used with the MusicXML parser:

Words that output an error: Action Again All Burn Burning By Calling Devil Edge Feeding Feeling Fight Fighting Fury Girl Gonna Heart Here New One Our Passion Phoenix Running Said Street Surrender Survivor Ticking Tonight Wanna We When Words that output an error only in plural but not in singular (ie. Enemy works, Enemies don't): Enemies Flames Words that work both in plural and singular: Stands Stand There're problems with 's and 're such as: There's We're

By the way, I'm not completely sure about the logic you use to modify the mellotron_utils.py file to fix these problems, especially the word "street" is giving me an hard time.

rafaelvalle commented 4 years ago

Thank you for sharing this list of words. We will soon add a mechanism to warn users about such issues.

karkirowle commented 4 years ago

I am either not entire surely what the events2eventsarpabet method does in mellotron_utils.py.

Couple of failure modes I found so far:

  1. From the example MusicXML, it is clear that all words has to start with an uppercase letter.
  2. Out of dictionary words. This is completely logical -- if there is no pronounciation given for a word, then it just does not work, these just need to be added to cmu dict by the user. The get_arpabet method would be a completely fine place to throw a warning, but I think this is quite straightforward.
  3. phoneme2grapheme map, I'm most puzzled by this one. So far, I could mostly get away with just appending and stepping if the first if fails. But this is probably not a good solution. Could you elaborate on what exactly is the role of this method?
Ctibor67 commented 2 years ago

Any new idea with problematic words? Or at least a demonstration of why these problems arise? In cmu_dictionary: word BE B IY1 - mistake, but word BEA B IY1 with same phonemes is ok....