NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
853 stars 187 forks source link

How to generate .musicxml files like the examples in `/data`? #119

Open mepc36 opened 1 year ago

mepc36 commented 1 year ago

Like other users here, I've been unable to generate my own .musicxml files that mellotron will successfully ingest. I get frequent out-of-list-range index errors in mellotron_utils.py's functions like add_events when doing so (I'll add those specific errors when I next get a chance). I need to do this because I'm interested in creating music using MusicXML as a standard to control mellotron's output for music compositions created at scale.

@rafaelvalle what package/tool/software did you use to create the .musicxml files in the ~/data folder please?

Thank you!

mepc36 commented 1 year ago

After reading the .musicxml files' tags more closely...

...
<?SmartMusic maxVoice=1?>
...

...I think the files were created with this site:

https://compose.smartmusic.com/

However, this is not much help, because the aforementioned out-of-range index errors are still occurring, even when I export the .musicalxml files using that exact site:

ubuntu@ip-172-31-93-8:~/mellotron$ python inference.py --input-file ./data/input.musicxml
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

/home/ubuntu/mellotron/stft.py:67: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  fft_window = pad_center(fft_window, filter_length)
/home/ubuntu/mellotron/layers.py:64: FutureWarning: Pass sr=22050, n_fft=1024, n_mels=80, fmin=0.0, fmax=8000.0 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  sampling_rate, filter_length, n_mel_channels, mel_fmin, mel_fmax)
/home/ubuntu/.local/lib/python3.7/site-packages/torch/serialization.py:868: SourceChangeWarning: source code of class 'torch.nn.modules.conv.ConvTranspose1d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/ubuntu/.local/lib/python3.7/site-packages/torch/serialization.py:868: SourceChangeWarning: source code of class 'torch.nn.modules.container.ModuleList' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/ubuntu/.local/lib/python3.7/site-packages/torch/serialization.py:868: SourceChangeWarning: source code of class 'torch.nn.modules.conv.Conv1d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
data/example1.wav exploring the expanses of space to keep our planet safe
/home/ubuntu/mellotron/stft.py:67: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  fft_window = pad_center(fft_window, filter_length)
/home/ubuntu/mellotron/layers.py:64: FutureWarning: Pass sr=22050, n_fft=1024, n_mels=80, fmin=0.0, fmax=8000.0 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  sampling_rate, filter_length, n_mel_channels, mel_fmin, mel_fmax)
/home/ubuntu/mellotron/audio_processing.py:50: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  win_sq = librosa_util.pad_center(win_sq, n_fft)
Traceback (most recent call last):
  File "inference.py", line 148, in <module>
    data = get_data_from_musicxml('data/{}'.format(sys.argv[1]), 132, convert_stress=True)
  File "/home/ubuntu/mellotron/mellotron_utils.py", line 464, in get_data_from_musicxml
    events_arpabet = add_space_between_events(events_arpabet)
  File "/home/ubuntu/mellotron/mellotron_utils.py", line 120, in add_space_between_events
    if new_events[-1][0][0] != ' ':
IndexError: list index out of range