FoundationVision / LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
https://arxiv.org/abs/2406.06525
MIT License
1.33k stars 55 forks source link

Cannot Reproduce LlamaGen-B or L numbers using provided models #48

Closed vkramanuj closed 4 months ago

vkramanuj commented 4 months ago

Hi, thanks for the great repo. I'm trying to reproduce some of your paper's numbers using the provided models. After using val.sh to produce the ImageNet val npz file, my reconstruction FID matches yours perfectly (VQ-16), 2.19. However, after sampling stage 2, my numbers disagree with yours by wide margins, except on Inception Score which matches exactly on GPT-B.

Here are the numbers I get:

GPT-B (16x16):

Inception Score: 193.6122283935547                                                                                   
FID: 6.133130640203092                                                                                               
sFID: 10.453294743799916                                                                                             
Precision: 0.75066                                                                                                   
Recall: 0.44764

GPT-L (16x16):

Inception Score: 288.1690368652344
FID: 6.226299024953619
sFID: 12.101763646433483
Precision: 0.7687
Recall: 0.47984

Commands (GPT-B):

torchrun --nproc_per_node=8 autoregressive/sample/sample_c2i_ddp.py --vq-ckpt /data/home/vkramanuj/assets/vq_ds16_c2i.pt --gpt-ckpt /data/home/vkramanuj/assets/c2i_B_256.pt --gpt-model GPT-B --image-size 256 --sample-dir /tmp/samples_v7_B --cfg-scale=2.0 

python3 evaluations/c2i/evaluator.py \
    /tmp/val_flatdataset.npz /tmp/samples_v7_B/GPT-B-c2i_B_256-size-256-size-256-VQ-16-topk-0-topp-1.0-temperature-1.0-cfg-2.0-seed-0.npz

Commands (GPT-L):

torchrun --nproc_per_node=8 autoregressive/sample/sample_c2i_ddp.py --vq-ckpt /data/home/vkramanuj/assets/vq_ds16_c2i.pt --gpt-ckpt /data/home/vkramanuj/assets/c2i_L_256.pt --gpt-model GPT-L --image-size 256 --sample-dir /tmp/samples_v7_L --cfg-scale=2.0 

python3 evaluations/c2i/evaluator.py \
    /tmp/val_flatdataset.npz /tmp/samples_v7_L/GPT-L-c2i_L_256-size-256-size-256-VQ-16-topk-0-topp-1.0-temperature-1.0-cfg-2.0-seed-0.npz

Any assistance would be greatly appreciated!

Ghy0501 commented 3 months ago

Hi, I have the same problem as you when I reproduced it, did you solve it? My result is: GPT-B (16x16): Inception Score: 191.79376220703125 FID: 6.113283307597328 sFID: 10.382730755632338 Precision: 0.74984 Recall: 0.45462