FoundationVision / VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
MIT License
3.78k stars 285 forks source link

Abnormal sample results with `demo_sample.ipynb` #41

Closed karrykkk closed 2 months ago

karrykkk commented 2 months ago

Thanks for sharing this nice work!

I tried to sample with demo_sample.ipynb and var_d16.pth without any changes, but I got these abnormal results😢: output_image

The following is my environment:

python                    3.10.14
torch                     2.0.1                    
torchvision               0.15.2                              
transformers              4.40.1               
triton                    2.0.0     
pillow                    10.3.0 
pytz                      2024.1 
typed-argument-parser     1.10.0             

Does this result from the precision problem or something else? I would appreciate it if you could help with this problem.🥺

keyu-tian commented 2 months ago

@karrykkk can you delete re-download the model's checkpoint file, and make sure all codes and demo_sample.ipynb are of the latest git commit, and run it again?

karrykkk commented 2 months ago

Thanks for your reply!

I have found the reason. It was because the key scale was not supported in torch==2.0.1 and I deleted this key argument in https://github.com/FoundationVision/VAR/blob/a5cf0a16624d27b4c687b08f2a0ab006438452cc/models/basic_var.py#L117. When I update torch to 2.1.2 and use the origin code, the samples become normal.