I understood that Mellotron puts audio or musicXML on the result of synthesis based on Tacotron2 and gives StyleTransfer accordingly. By the way, if there is no reference file here, can't I just bring the general TTS composite result? I looked at the code section of model.py, but I'm asking because I don't think it's relevant.
I understood that Mellotron puts audio or musicXML on the result of synthesis based on Tacotron2 and gives StyleTransfer accordingly. By the way, if there is no reference file here, can't I just bring the general TTS composite result? I looked at the code section of model.py, but I'm asking because I don't think it's relevant.