Rongjiehuang / FastDiff

PyTorch Implementation of FastDiff (IJCAI'22)
408 stars 64 forks source link

Question about noise scheduling process. #8

Open LEECHOONGHO opened 2 years ago

LEECHOONGHO commented 2 years ago

Hello I'm trying to implement noise scheduling process refer to BDDM's implementation BDDM/sampler.py

And I have some question for noise scheduling process for FastDiff-TTS.

  1. In the Fastdiff paper, the alphaN, betaN is set as hyperparameter like αˆt = 0.54, βˆt = 0.70. Can I use this hyper parameter for my own Fastdiff-TTS module or another number of reverse steps(ex) 6, 8, 10...)? How does it Calculated?

  2. For BDDM, searching alphaN, betaN requires some greedy searching with search_bin=9, and further searching step=10 for adding noise for params. ex) _alpha_param = alpha_param * (0.95 + np.random.rand() * 0.1) Dose Fastdiff requires similar process like above?

  3. For BDDM, STOI and PESQ is estimated for generated audio to find best noise schedule. How could we select best parameters based on two indicators STOI and PESQ?

  4. Are STOI and PESQ also needed for parameter searching process for Fastdiff?

  5. In BDDM, num_reverse_steps = math.floor( T / tau ). But in Fastdiff, T=1000, tau=200 and num_reverse_steps=4. Do I need to calculate num_reverse_steps by math.floor(T/tau) - 1? image

Thank you.

Rongjiehuang commented 2 years ago

Hi,

  1. It's OK to use another number of reverse steps, and just set the maximum number of sampling steps in scheduling ("N") in BDDM.
  2. the noise predictor of FastDiff shares a similar mechanism as BDDM's, and thus the calculation of STOI and PESQ is required.
  3. Thanks, this $\tau$ is a typo, and the algorithm still remains math.floor(T/tau). You could try it yourself: the higher $\tau$ is, the shorter the predicted inference schedule tends to be.