Question about cannot reproduce FID results

Ghy0501 commented 3 months ago

Hi, thanks for the great repo. I tried to reproduce the results in the paper with the model weights you provided, but the results are much worse than those in the paper. The reproduced commands and results are showed below:

val:

val.sh:

# !/bin/bash
set -x
export NCCL_P2P_LEVEL=NVL

torchrun \
--nnodes=1 --nproc_per_node=2 --node_rank=0 \
--master_port=12343 \
tokenizer/validation/val_ddp.py \
--data-path /mnt/ShareDB_1TB/datasets/imagenet-1k/val \
"$@"

command:

sh scripts/tokenizer/val.sh

GPT-XL :

command:

bash scripts/autoregressive/sample_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/c2i_XL_384.pt --gpt-model GPT-XL --image-size 384 --image-size-eval 256 --cfg-scale 1.75

python3 evaluations/c2i/evaluator.py mywork/LlamaGen/reconstructions/val_imagenet.npz mywork/LlamaGen_ours/samples/GPT-XL-c2i_XL_384-size-384-size-256-VQ-16-topk-0-topp-1.0-temperature-1.0-cfg-1.75-seed-0.npz

reproduced results:

Inception Score: 245.02481079101562
FID: 3.6842287284328563
sFID: 8.495801038066816
Precision: 0.70964
Recall: 0.57578

We also reproduce the results of GPT-B and GPT-L, the results is similar to https://github.com/FoundationVision/LlamaGen/issues/48. I followed the commands you provided as closely as possible in my reproduction, except for the number of GPUs, and I'm curious if the difference in results is due to the number of GPUs. Any assistance would be greatly appreciated!

Ghy0501 commented 3 months ago

Using VIRTUAL_imagenet256_labeled.npz instead of val_imagenet.npz solves this issue

JosephPai commented 1 month ago

Hi @Ghy0501 , could you kindly advise me what is the difference between the provided VIRTUAL_imagenet256_labeled.npz and the self-computed val_imagenet.npz? I suppose they should be the same. Thanks a lot!

FoundationVision / LlamaGen

Question about cannot reproduce FID results #53