X-LANCE / UniCATS-CTX-txt2vec

[AAAI 2024] CTX-txt2vec, the acoustic model in UniCATS
https://cpdu.github.io/unicats
57 stars 8 forks source link

Experimenting with number of diffusion steps #4

Closed danablend closed 8 months ago

danablend commented 8 months ago

Hello, have you experimented with the number of diffusion steps?

I notice it is set to 100 at the moment, so if it is possible to set it to a lower number like 20, the model could be much much faster at training and computing inference. Is this something that has been tested, or not yet? Thanks!

cantabile-kwok commented 8 months ago

Well, this problem has two folds, the inference steps and the training steps:

  1. The inference steps: we experimented using different number of sampling steps. Empirically, lower than 5 steps will cause a substantial degradation of generated samples, but 20 steps or so will still be OK, with only a minor decrease in quality. In order to achieve the best quality, we set it to 100 in the code. Feel free to decrease it for faster inference.
  2. The training steps: in the diffusion context, the training steps means how to split the diffusion process into discrete small-steps. Note that this does not importantly affect the training time, because we don't iterative through these steps in training. Instead, we sample a number from the total steps, like 59, and the model is trained to learn the change of data in that small period of evolution. We sample this number for every batch, so that eventually the model is able to learn the whole diffusion process. Usually 100 is a good choice for training, because if you decrease that number, the granularity of splitting the diffusion process becomes larger, and the change of data within each sub-period becomes bigger and harder for the model to learn.
danablend commented 8 months ago

Aha, that makes a lot of sense! Thank you for your reply, this was very helpful. 👍