archinetai / audio-diffusion-pytorch

Audio generation using diffusion models, in PyTorch.
MIT License
1.95k stars 169 forks source link

Add support to clip predicted samples to the desired range. #55

Open Kinyugo opened 1 year ago

Kinyugo commented 1 year ago

In diffusion it is common to want to clip samples to a desired range like [-1, 1], I think previous versions of this package supported this. However, the current implementation does not support this.

I think it would be useful to support clipping samples to a desired range.

VSampler

  def forward(..., clip_denoised: bool = False, dynamic_threshold: float = 0.0) -> Tensor:
    ...
    x_pred = alphas[i] * x_noisy - betas[i] * v_pred 
    # Add clipping support here 
    if clip_denoised:
      clip(x_pred, dynamic_threshold=dynamic_threshold)
    ...

I am happy to open a PR if this is acceptable.

flavioschneider commented 1 year ago

Hey Kinyugo! Looks good to me. Only things is that dynamic thresholding is usually applied inside the sampling loop not only at the end. So a simple x_pred.clamp(-1,1) is probably enough -- I didn't transfer dynamic thresholding to v-diff since I'm not sure it would play well inside the sampling loop as we're not only predicting the ground truth like with normal or k-diff.

Kinyugo commented 1 year ago

Hello Flavio. It makes sense not have dynamic thresholding. Have you experimented with the effects of clipping on the final sample quality?