The self-reproduce audio result is bad, too smoothing

acids-ircam / RAVE

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

Other

1.3k stars 176 forks source link

The self-reproduce audio result is bad, too smoothing #251

Closed didadida-r closed 9 months ago

didadida-r commented 1 year ago

Hi, I use 60 hours audio effect data to train the Rave model, and i have try the default v2\v3\discrete model. But i find the vae model result is too smoothing, the self-reproduce result of vae is bad. Is there anything parameter i should tune.

the discrete model training loss

the v2 model training loss

here is the origin audio. the discrete vae output audio using onnx export. the v2 vae output audio using onnx export.

robclouth commented 1 year ago

do you find that the discrete vae is more accurate? Also how did you get the onnx export working? My models just produce thin crackly noise when the libtorch one work fine