FoundationVision / LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
https://arxiv.org/abs/2406.06525
MIT License
1.35k stars 55 forks source link

Mismatched model weights document #22

Closed Artanic30 closed 5 months ago

Artanic30 commented 5 months ago

Hi, I'm currently testing the official checkpoints. I found the model name and config in autoregressive/models/gpt.py

### text-conditional
def GPT_7B(**kwargs):
    return Transformer(ModelArgs(n_layer=32, n_head=32, dim=4096, **kwargs)) # 6.6B

def GPT_3B(**kwargs):
    return Transformer(ModelArgs(n_layer=24, n_head=32, dim=3200, **kwargs)) # 3.1B

def GPT_1B(**kwargs):
    return Transformer(ModelArgs(n_layer=22, n_head=32, dim=2048, **kwargs)) # 1.2B

### class-conditional
def GPT_XXXL(**kwargs):
    return Transformer(ModelArgs(n_layer=48, n_head=40, dim=2560, **kwargs)) # 3.9B

def GPT_XXL(**kwargs):
    return Transformer(ModelArgs(n_layer=48, n_head=24, dim=1536, **kwargs)) # 1.4B

def GPT_XL(**kwargs):
    return Transformer(ModelArgs(n_layer=36, n_head=20, dim=1280, **kwargs)) # 775M

def GPT_L(**kwargs):
    return Transformer(ModelArgs(n_layer=24, n_head=16, dim=1024, **kwargs)) # 343M

def GPT_B(**kwargs):
    return Transformer(ModelArgs(n_layer=12, n_head=12, dim=768, **kwargs)) # 111M

In Readme.md, I can successfully load c2i_3B_384.pt as GPT_3B in LlamaGen-3B | 3.1B | FSDP | 24x24 | 2.18 c2i_3B_384.pt. However, the GPT_3B is marked as text-conditional in above code.

PeizeSun commented 5 months ago

Hi~ This is because we originally planned to train 3 text-conditioned image generation models from 1B to 7B. But limited to resources, we didn’t achieve it.

Thanks for your pointing out. We will fix it soon.

Artanic30 commented 5 months ago

Thanks for the response.