TencentARC / Open-MAGVIT2

Open-MAGVIT2: Democratizing Autoregressive Visual Generation
Apache License 2.0
270 stars 7 forks source link

Inquiry Regarding Codebook Selection Method in Open-MAGVIT2 #4

Closed zhaohm14 closed 2 weeks ago

zhaohm14 commented 2 weeks ago

Hello,

While exploring the Open-MAGVIT2 repository, I noticed an interesting approach to codebook selection implemented in the following code snippet: https://github.com/TencentARC/Open-MAGVIT2/blob/main/taming/modules/vqvae/quantize.py#L53

        ## could possible replace this here
        # #\start...
        # find closest encodings
        min_encoding_indices = torch.argmin(d, dim=1).unsqueeze(1)

        min_encodings = torch.zeros(
            min_encoding_indices.shape[0], self.n_e).to(z)
        min_encodings.scatter_(1, min_encoding_indices, 1)

        # dtype min encodings: torch.float32
        # min_encodings shape: torch.Size([2048, 512])
        # min_encoding_indices.shape: torch.Size([2048, 1])

        # get quantized latent vectors
        z_q = torch.matmul(min_encodings, self.embedding.weight).view(z.shape)
        #.........\end

        # with:
        # .........\start
        #min_encoding_indices = torch.argmin(d, dim=1)
        #z_q = self.embedding(min_encoding_indices)
        # ......\end......... (TODO)

I am curious about the decision to use the more verbose method of creating min_encodings and using matrix multiplication for obtaining quantized latent vectors (z_q), instead of directly using self.embedding(min_encoding_indices). Could there be specific reasons related to performance or implementation details that favor this approach?

Thank you for your insights!

RobertLuo1 commented 2 weeks ago

Thanks for your interest in our work! It is from https://github.com/CompVis/taming-transformers/blob/3ba01b241669f5ade541ce990f7650a3b8f65318/taming/modules/vqvae/quantize.py#L70, the initial version of VQ. We have not conducted any experiments on it. Generally, using self.embedding(min_encoding_indices) is a more prevalent way to get codebook entries, which can be found in https://github.com/CompVis/taming-transformers/blob/3ba01b241669f5ade541ce990f7650a3b8f65318/taming/modules/vqvae/quantize.py#L213.

zhaohm14 commented 2 weeks ago

Understood. Thanks a lot!