Closed leon2milan closed 2 years ago
You should prepare: phoneme | pitch_midi | pitch_dur | is_slur ,then write to the data 'test“ just use IndexedDatasetBuilder like process_data() in base_binarizer.py.
You need to fix some data loading problems (getitem 、collater in fs2_utils.py). Just set it to None . They are not necessary in the synthesis stage.
This is my first exposure to singing synthesis. So I have some question about the terminology.
Does pitch_midi | pitch_dur
mean note & note duration
?
Should I set is_slur
through staffs ?
And I don't know how to set pitch_dur in a unseen song. Should I use Logic Pro to label it ? Or I can get this by some model or something like this.
This is my first exposure to singing synthesis. So I have some question about the terminology. Does
pitch_midi | pitch_dur
meannote & note duration
? Should I setis_slur
through staffs ? And I don't know how to set pitch_dur in a unseen song. Should I use Logic Pro to label it ? Or I can get this by some model or something like this.
Wait a minute. I'll find you a picture
We use the data marked by yellow box, phoneme | pitch_midi | pitch_dur
pitch_dur = 60 * NoteBeats / bmp
bmp : beats per minute --the speed
Thank you very much. I know how to do this. But I have another question. There is silence in music. And it won't work if I simply turn text into pinyin? Should I do singing - Lyrics alignment?
2001000005|面对浩瀚的星海我们微小得像尘埃|m ian d ui h ao h an an d e x ing h ai ai ai AP w o m en w ei x iao d e x iang ch en ai ai ai SP|C#4/Db4 C#4/Db4 D#4/Eb4 D#4/Eb4 C#4/Db4 C#4/Db4 D#4/Eb4 D#4/Eb4 E4 D#4/Eb4 D#4/Eb4 E4 E4 G#4/Ab4 G#4/Ab4 A4 G#4/Ab4 rest C#4/Db4 C#4/Db4 C#4/Db4 C#4/Db4 D#4/Eb4 D#4/Eb4 C#4/Db4 C#4/Db4 D#4/Eb4 D#4/Eb4 E4 E4 E4 E4 G#4/Ab4 A4 G#4/Ab4 rest|0.196990 0.196990 0.102120 0.102120 0.304680 0.304680 0.096780 0.096780 0.100220 0.150010 0.150010 0.361460 0.361460 0.221070 0.221070 0.183240 0.478670 0.384620 0.106510 0.106510 0.143020 0.143020 0.169480 0.169480 0.224180 0.224180 0.089360 0.089360 0.414460 0.414460 0.378050 0.378050 0.162790 0.207380 0.317260 0.297040|0.02765 0.16934 0.01874 0.08338 0.0821 0.22258 0.0693 0.02748 0.10022 0.07137 0.07864 0.12471 0.23675 0.12356 0.09751 0.18324 0.47867 0.38462 0.0405 0.06601 0.08303 0.05999 0.04687 0.12261 0.09778 0.1264 0.02321 0.06615 0.11958 0.29488 0.06723 0.31082 0.16279 0.20738 0.31726 0.29704|0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0
You should learn from transcriptions.txt
OK。 Thank you so much. I'll try.
@leon2milan did you succeed? can you share an example code?
I want to test opencpop preitrain model on unseen song. I don't know how to generate the wav file.
test_step
inFastSpeech2Task
, but it seems for tts task. So I need overridetest_step
inDiffSingerMIDITask
? Is there other way to solve this? Without packing data into dataloader, just load model, and infer.