Closed pneumoman closed 4 years ago
Hi, I encountered the same question. When I synthesize from Hallelujah musicxml with the provided checkpoint, the synthesized samples seem to have little difference if changing the mel input. Here are three examples, all synthesized with sid=40, pan=-42, and the mel inputs are zeros, mels extracted from example1.wav and example2.wav respectively. https://drive.google.com/file/d/10fQdc25FJMHb7Twpwhie35hf2C8I69Dv/view?usp=sharing Curious about what the effect of global style token here. Actually, if we synthesize from musicxml, we do not have a mel for input. It is strange to input a mel when synthesize from musicxml. Pls advice. Thanks.
We recently fixed a bug in our code that scaled the audio inputs to the wrong range and made the Mels unefective. Please pull from master, use a Mel sample that has different characteristics from your training set (screaming or whispering), try again and let us know.
seems better
Hi,
In studying the way this model works, using the pre-trained model, I found that it seemed like the MEL was having little effect. In fact in my very small blind poll (my co-worker and our 'sound guy') that the effect was detrimental. Curious about your thoughts.