Open chenjiasheng opened 1 year ago
@chenjiasheng nice catch! So EnCodec was trained only on EMA updates without codebook expiration.
I also have something to add about the cluster_size
buffers. If we look at any codebook's cluster_size
for the trained model, almost all codes have values less than threshold_ema_dead_code = 2
and would have been replaced at each training step if the codebook expiration worked correctly.
from encodec import EncodecModel
model = EncodecModel.encodec_model_24khz()
codebook = model.quantizer.vq.layers[0]._codebook
print(codebook.embed.shape)
print(codebook.cluster_size.sum())
print((codebook.cluster_size < codebook.threshold_ema_dead_code).sum())
> torch.Size([1024, 128])
> tensor(600.0010)
> tensor(1003)
The sum of each codebook's cluster_size
is 600, which is the number of frames in a batch on a single GPU (75 fps x 8), and is less than the number of codebooks 1024, implying constant codebook underutilization. EnCodec was trained on 8 GPUs with 4800 per batch, but the cluster_size
buffers are not aggregated across processes as they should be.
@ilya16 Agreed. The first batch (75fps 1s 8) has not enough frames to initialize the codebook (with size 1024), Also, doing expiration once per batch is too frequent, as there is not enough frames seen during only one batch, making most of the codes hunger to death.
https://github.com/facebookresearch/encodec/blob/0e2d0aed29362c8e8f52494baf3e6f99056b214f/encodec/quantization/core_vq.py#LL220C18-L220C18
I have found that expiration handling for codebook did not work at all. Even if we deliberately clear the data of self.embed after self.expirecodes(), the final self.embed we get is not affected. This demonstrates that the modification of self.embed content in self.expirecodes() is irrelevant.
Also, I would like to ask if codebook expiration is really important? It seems that you have obtained good results even with the wrong code.