facebookresearch / encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
MIT License
3.52k stars 304 forks source link

Number of codebooks and calculation of bitrates #39

Open rkstgr opened 1 year ago

rkstgr commented 1 year ago

❓ Questions

I don't understand how you come to the smallest bitrate of 1.5 kbps for the 24kHz model:

If I understand correctly, we take a multiple of 4 number of codebooks (4, 8, 12, ... so 4 would be the minimum), and we have 10 bits per codebook (2^10 = 1024 entries), and for the 24kHz model 75 latent codes per second, giving us the smallest possible bit rate: 4 10 bits 75 1/s = 3kbps

However, both the paper and the README state that the lowest bitrate is 1.5kbps. Looking at the bitrate progression (1.5, 3, 6, 12, 24), which doubles at each step, wouldn't that rather correspond to 2, 4, 8, 16, 32 codebooks being used? Maybe I am just misinterpreting or missing something, could you please clarify this point?

adefossez commented 1 year ago

1.5 kbps is indeed two codebooks, sorry if a typo somewhere made you think it was 4, can you point me to what made you think we used multiple of 4 codebooks ?

rkstgr commented 1 year ago

Current paper version on arxiv (https://arxiv.org/abs/2210.13438) under 3.2 Residual Vector Quantization:

When doing variable bandwidth training, we select randomly a number of codebooks as a multiple of 4, i.e. corresponding to a bandwidth 1.5, 3, 6, 12 or 24 kbps at 24 kHz.

adefossez commented 1 year ago

okay, that's a mistake ! thanks for pointing this out, we will fix it in the next revision of the paper.

listener17 commented 1 year ago

Good catch @rkstgr

kevinkazzo commented 1 year ago

i was confused for a second thanks