facebookresearch / encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
MIT License
3.3k stars 298 forks source link

SoundStream improved reimplementation #3

Open kwinz opened 1 year ago

kwinz commented 1 year ago

Thanks for publishing this! In the encodec paper you write

For fair evaluation, we also compare EnCodec to our reimplementation of SoundStream (Zeghidour et al., 2021). [...] Finally, we compare EnCodec to the SoundStream model from the official implementation available in Lyra 2 1 at 3.2 kbps and 6 kbps on audio upsampled to 32 kHz. We also reproduced a version of SoundStream (Zeghidour et al., 2021) with minor improvements. Namely, we use the relative feature loss introduce in Section 3.4, and layer normalization (applied separately for each time step) in the discriminators, except for the first and last layer, which improved the audio quality during our preliminary studies.

And on https://ai.honu.io/papers/encodec/samples.html you show samples of this reimplementation. Could you share the source code of your SoundStream reimplementation so this work can be reproduced?

JadeCopet commented 1 year ago

We do not plan to release the Soundstream reimplementation for now. We would recommend using the Soundstream model from the official implementation available in Lyra 2 to compare to Soundstream.