feizc / Dimba

Transformer-Mamba Diffusion Models
75 stars 5 forks source link

Question about the model design. #8

Closed Yangr116 closed 1 month ago

Yangr116 commented 2 months ago

Hi, this is a great work! I would like to know why you use bidirectional Mamba? Does a single directional Mamba have any problems in your experiments?

feizc commented 1 month ago

Hi, It is generally believed that the single direction is not as good as the birectional Mamba. At the same time, different scan strategies can further improve the generationperformance, which can refer to the discussion in Zigma and DIM paper. For simplicity, we used bidirectional Mamba here.

However, it is worth noting that there has been an increasing focus of work on autoregression, such as llamagen [3] and Kaiming He' recent work [4].

[1] ZigMa: A DiT-style Zigzag Mamba Diffusion Model [2] DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis [3] Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation [4] Autoregressive Image Generation without Vector Quantization

Yangr116 commented 1 month ago

thks! I have noticed these papers.😊

费政聪 @.***>于2024年7月15日 周一11:16写道:

Hi, It is generally believed that the single direction is not as good as the birectional Mamba. At the same time, different scan strategies can further improve the generationperformance, which can refer to the discussion in Zigma and DIM paper. For simplicity, we used bidirectional Mamba here.

However, it is worth noting that there has been an increasing focus of work on autoregression, such as llamagen [3] and Kaiming He' recent work [4].

[1] ZigMa: A DiT-style Zigzag Mamba Diffusion Model [2] DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis [3] Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation [4] Autoregressive Image Generation without Vector Quantization

— Reply to this email directly, view it on GitHub https://github.com/feizc/Dimba/issues/8#issuecomment-2227636103, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARTC2EH6DHHSTOVE35Y24SDZMM5J7AVCNFSM6AAAAABKK33J32VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRXGYZTMMJQGM . You are receiving this because you authored the thread.Message ID: @.***>