acids-ircam / RAVE

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder
Other
1.3k stars 177 forks source link

audio distance and PQMF in 2.3.1 #302

Open victor-shepardson opened 6 months ago

victor-shepardson commented 6 months ago

fullband spectral distance is now computed between the original waveform and the model reconstruction: https://github.com/acids-ircam/RAVE/blob/8b250310fecfe61a6d9d53e8e5551851f4638d35/rave/model.py#L291 https://github.com/acids-ircam/RAVE/blob/8b250310fecfe61a6d9d53e8e5551851f4638d35/rave/model.py#L340

compared to older versions, where it was computed between the model reconstruction and the PQMF reconstruction of the data: https://github.com/acids-ircam/RAVE/blob/766192601be9ee8087069f146eb81c21f68ce7c9/rave/model.py#L239-L249

iiuc this should mainly affect causal models, but it seems like a problem in that case, since a causal model would not be able to compensate for the delay induced by the PQMF, unless I misunderstand?

do you know if the change was deliberate or why it was made?

domkirke commented 5 months ago

Hello Victor,

I am finally running through your issues. I actually change the code to be more general as the older ones were a little tweaked, but the inner behavior did not change : a multiband loss for each PQMF band, and a fullband loss for the global signal. And, regarding causality : causality only change the internal padding of convolutional layers, such that a given latent position does not influence the future of the sequence. Furthermore, reconstruction is done on both slightly-delayed PQMF, and is then trained to compensate it with the fullband distance. Some improvements could be done here though, even if nothing comes into my mind right now.