Open jinnan-chen opened 3 weeks ago
You might need to check the parameters such as vae_embed_dim, vae_stride, etc?
Hi, My tokens is not from 2D images, so I dont have vae_stride, and my token_embed_dim=vae_embed_dim=64. When I use token_embed_dim=64, seq_len=buffer_size=256, it converges fast and generate good results. So, when I increase the self.seq_len, should I increase the buffer_size during training and increase num_iter in sample_tokens accordingly?
buffer size does not need to be increased. num_iter should be increased (e.g., 128 for seq_len=1024)
Hi Tianhong,
I have trained the MAR on 1D unordered latents, it works fine for 256 tokens with 64 chanels, the loss converges at 0.35. However, when training on 1k or 2k tokens with 64 chanels, the loss converge at 0.45 and the results looks bad, even though the VAE reonstruction ability is higher than 256 tokens. Is there any suggestions? Thanks!