bytedance / 1d-tokenizer

This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation
Apache License 2.0
404 stars 16 forks source link

Why permute NLD to LND shape? #7

Closed tau-yihouxiang closed 3 months ago

tau-yihouxiang commented 3 months ago

Why permute x's shape from NLD to LND? This is different from the principle, which means attention in batch channel of each word.

x = x.permute(1, 0, 2)  # NLD -> LND
for i in range(self.num_layers):
    x = self.transformer[i](x)
x = x.permute(1, 0, 2)  # LND -> NLD
MaxxP0 commented 3 months ago

because they dont use MultiHeadAttention with batch_first=True.

tau-yihouxiang commented 3 months ago

Thanks~