NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
854 stars 184 forks source link

[QUESTION] About lyrics hyphenation #58

Closed loretoparisi closed 4 years ago

loretoparisi commented 4 years ago

My question is about how the words in the lyric are "hyphenated" when adjusted to the pitches/notes in the xml file, like the word

Hallelujah is "hyphenated" (not the correct term indeed) in the following as

Hal ° le ° lu ° ° jah (where there is an additional note between lu and jah)

     <note default-x="376">
        <pitch>
          <step>D</step>
          <octave>5</octave>
        </pitch>
        <duration>2</duration>
        <voice>1</voice>
        <type>eighth</type>
        <stem default-y="-45">down</stem>
        <lyric default-y="-81" number="1" relative-x="7">
          <syllabic>begin</syllabic>
          <text>Hal</text>
        </lyric>
      </note>
    </measure>
    <!--=======================================================-->
    <measure number="4" width="419">
      <print new-system="yes">
        <system-layout>
          <system-margins>
            <left-margin>2</left-margin>
            <right-margin>0</right-margin>
          </system-margins>
          <system-distance>113</system-distance>
        </system-layout>
      </print>
      <note default-x="90">
        <pitch>
          <step>C</step>
          <alter>1</alter>
          <octave>5</octave>
        </pitch>
        <duration>2</duration>
        <voice>1</voice>
        <type>eighth</type>
        <stem default-y="-50">down</stem>
        <notations>
          <slur number="1" placement="above" type="start"/>
        </notations>
        <lyric default-y="-81" number="1">
          <syllabic>middle</syllabic>
          <text>le</text>
        </lyric>
      </note>
      <note default-x="136">
        <pitch>
          <step>D</step>
          <octave>5</octave>
        </pitch>
        <duration>4</duration>
        <voice>1</voice>
        <type>quarter</type>
        <stem default-y="-45">down</stem>
        <notations>
          <slur number="1" type="stop"/>
        </notations>
      </note>
      <note default-x="226">
        <pitch>
          <step>C</step>
          <alter>1</alter>
          <octave>5</octave>
        </pitch>
        <duration>2</duration>
        <voice>1</voice>
        <type>eighth</type>
        <stem default-y="-50">down</stem>
        <lyric default-y="-81" number="1" relative-x="7">
          <syllabic>middle</syllabic>
          <text>lu</text>
        </lyric>
      </note>
      <note default-x="272">
        <pitch>
          <step>D</step>
          <octave>5</octave>
        </pitch>
        <duration>4</duration>
        <voice>1</voice>
        <type>quarter</type>
        <stem default-y="-45">down</stem>
        <lyric default-y="-81" number="1" relative-x="9">
          <syllabic>end</syllabic>
          <text>jah</text>
        </lyric>
      </note>
rafaelvalle commented 4 years ago

Hyphens are used to mark that the syllables belong to the same word. This is necessary to compute pitch and phoneme durations from a MusicXML score.

loretoparisi commented 4 years ago

@rafaelvalle okay, it makes sense. Which kind of hyphenations are you using? If I refer like to PyHyphen I would get ['Hal', 'lelu', 'jah'] like in this repl:

h_en = Hyphenator('en_US')
word = 'Hallelujah'
print( h_en.syllables(word) )
# ['Hal', 'lelu', 'jah']

Thanks.

rafaelvalle commented 4 years ago

Hyphenation is done according to the music score.

loretoparisi commented 4 years ago

@rafaelvalle thank you Rafael. So basically is based on MusicXML/midi by a musician / composer who edited the file in Finale etc. So supposed to have timestamp based lyrics, I would need polyphonic music for that lyrics (hence with words hyphenated already according to the music score and music sheet), in order to create the MusicXML file as input to the mellotron. I'm aware of recent CNN based networks to extract melody from vocals directly (hence MIDI), rather than older known methods (like Salmon MTG Melody plugin for Sonic Visualizer) or other HMM based methods).

rafaelvalle commented 4 years ago

Closing due to inactivity.