LTH14 / mar

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
MIT License
1.03k stars 56 forks source link

Increasing token size #71

Open jinnan-chen opened 3 weeks ago

jinnan-chen commented 3 weeks ago

Hi Tianhong,

I have trained the MAR on 1D unordered latents, it works fine for 256 tokens with 64 chanels, the loss converges at 0.35. However, when training on 1k or 2k tokens with 64 chanels, the loss converge at 0.45 and the results looks bad, even though the VAE reonstruction ability is higher than 256 tokens. Is there any suggestions? Thanks!

LTH14 commented 3 weeks ago

You might need to check the parameters such as vae_embed_dim, vae_stride, etc?

jinnan-chen commented 3 weeks ago

Hi, My tokens is not from 2D images, so I dont have vae_stride, and my token_embed_dim=vae_embed_dim=64. When I use token_embed_dim=64, seq_len=buffer_size=256, it converges fast and generate good results. So, when I increase the self.seq_len, should I increase the buffer_size during training and increase num_iter in sample_tokens accordingly?

LTH14 commented 3 weeks ago

buffer size does not need to be increased. num_iter should be increased (e.g., 128 for seq_len=1024)