MoonInTheRiver / DiffSinger

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
MIT License
4.27k stars 713 forks source link

The calculation of spec_min and spec_max #23

Closed Charlottecuc closed 2 years ago

Charlottecuc commented 2 years ago

Hi. I notice that for each dataset, you calculated and put the spec_min & spec_max in the config files (e.g. https://github.com/MoonInTheRiver/DiffSinger/blob/c2fb5b32502e1e7e4b2a077bd9d83bb1c39e2b4e/usr/configs/popcs_ds_beta6.yaml). How did you calculate these features? (so that we can calculate the consistent features with you).

Thank you very much.

MoonInTheRiver commented 2 years ago

add these codes:

def build_tts_model(self):

get min and max

    import torch
    from tqdm import tqdm
    v_min = torch.ones([80]) * 100
    v_max = torch.ones([80]) * -100
    for i, ds in enumerate(tqdm(self.dataset_cls('train'))):
        v_max = torch.max(torch.max(ds['mel'].reshape(-1, 80), 0)[0], v_max)
        v_min = torch.min(torch.min(ds['mel'].reshape(-1, 80), 0)[0], v_min)
       # if i % 100 == 0:
         #   print(i, v_min, v_max)
    print('final', v_min, v_max)

...

MoonInTheRiver commented 2 years ago

uncomment this part of codes: https://github.com/MoonInTheRiver/DiffSinger/blob/82f1a1bf169a880db0b33c5fde117554aaddc05d/usr/diffsinger_task.py#L41

ChuhanWang10 commented 1 year ago

Hi, thanks for the answer above. However, I am still confused. Could you briefly explained what do spec_min and spec_max mean, and how can I calculate them if I use a different dataset? If I uncomment the above codes should I just delete spec_min and spec_max in the config files?