facebookresearch / encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
MIT License
3.43k stars 307 forks source link

Codebook expiration does not take effect at all #61

Open chenjiasheng opened 1 year ago

chenjiasheng commented 1 year ago

https://github.com/facebookresearch/encodec/blob/0e2d0aed29362c8e8f52494baf3e6f99056b214f/encodec/quantization/core_vq.py#LL220C18-L220C18

I have found that expiration handling for codebook did not work at all. Even if we deliberately clear the data of self.embed after self.expirecodes(), the final self.embed we get is not affected. This demonstrates that the modification of self.embed content in self.expirecodes() is irrelevant.

        if self.training:
            # We do the expiry of code at that point as buffers are in sync
            # and all the workers will take the same decision.
            self.expire_codes_(x)       # NOTICE: The modification on self.embed inside expire_codes_ is irrelevant and will be overwritten by line 229
            self.embed.data.zero_()   # NOTICE: Deliberately clear self.embed
            ema_inplace(self.cluster_size, embed_onehot.sum(0), self.decay)
            embed_sum = x.t() @ embed_onehot
            ema_inplace(self.embed_avg, embed_sum.t(), self.decay)
            cluster_size = (
                laplace_smoothing(self.cluster_size, self.codebook_size, self.epsilon)
                * self.cluster_size.sum()
            )
            embed_normalized = self.embed_avg / cluster_size.unsqueeze(1)
            print(embed_normalized)  # NOTICE: Whether to add line 217 or not, embed_normalized has a consistent value
            self.embed.data.copy_(embed_normalized)

Also, I would like to ask if codebook expiration is really important? It seems that you have obtained good results even with the wrong code.

ilya16 commented 1 year ago

@chenjiasheng nice catch! So EnCodec was trained only on EMA updates without codebook expiration.

I also have something to add about the cluster_size buffers. If we look at any codebook's cluster_size for the trained model, almost all codes have values less than threshold_ema_dead_code = 2 and would have been replaced at each training step if the codebook expiration worked correctly.

from encodec import EncodecModel

model = EncodecModel.encodec_model_24khz()

codebook = model.quantizer.vq.layers[0]._codebook
print(codebook.embed.shape)
print(codebook.cluster_size.sum())
print((codebook.cluster_size < codebook.threshold_ema_dead_code).sum())

> torch.Size([1024, 128])
> tensor(600.0010)
> tensor(1003)

The sum of each codebook's cluster_size is 600, which is the number of frames in a batch on a single GPU (75 fps x 8), and is less than the number of codebooks 1024, implying constant codebook underutilization. EnCodec was trained on 8 GPUs with 4800 per batch, but the cluster_size buffers are not aggregated across processes as they should be.

chenjiasheng commented 1 year ago

@ilya16 Agreed. The first batch (75fps 1s 8) has not enough frames to initialize the codebook (with size 1024), Also, doing expiration once per batch is too frequent, as there is not enough frames seen during only one batch, making most of the codes hunger to death.