acids-ircam / RAVE

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder
Other
1.3k stars 176 forks source link

The self-reproduce audio result is bad, too smoothing #251

Closed didadida-r closed 9 months ago

didadida-r commented 1 year ago

Hi, I use 60 hours audio effect data to train the Rave model, and i have try the default v2\v3\discrete model. But i find the vae model result is too smoothing, the self-reproduce result of vae is bad. Is there anything parameter i should tune.

the discrete model training loss image

the v2 model training loss image

here is the origin audio. image the discrete vae output audio using onnx export. image the v2 vae output audio using onnx export. image

robclouth commented 1 year ago

do you find that the discrete vae is more accurate? Also how did you get the onnx export working? My models just produce thin crackly noise when the libtorch one work fine