FoundationVision / LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
https://arxiv.org/abs/2406.06525
MIT License
991 stars 34 forks source link

VQ-VAE ckpt optimizer states? #15

Open julian-q opened 3 weeks ago

julian-q commented 3 weeks ago

Hello! Thank you for the clean + user friendly codebase!

I'm trying to finetune the VQ-VAE tokenizer and noticed some keys might be missing from the pretrained checkpoint listed on huggingface: "optimizer", "discriminator", and "optimizer_disc". See here:

command:

torchrun --nnodes=1 --nproc_per_node=1 -m tokenizer.tokenizer_image.vq_train --finetune --disc-start 0 --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --dataset imagenet --data-path /home/julian/images --cloud-save-path ./training-save-dir --global-batch-size 8

output:

| distributed init (rank 0): env://
[2024-06-13 09:12:10] Experiment directory created at results_tokenizer_image/000-VQ-16
[2024-06-13 09:12:10] Experiment directory created in cloud at ./training-save-dir/2024-06-13-09-12-10/000-VQ-16/checkpoints
[2024-06-13 09:12:10] Namespace(data_path='/home/julian/images', data_face_path=None, cloud_save_path='./training-save-dir', no_local_save=False, vq_model='VQ-16', vq_ckpt='./pretrained_models/vq_ds16_c2i.pt', finetune=True, ema=False, codebook_size=16384, codebook_embed_dim=8, codebook_l2_norm=True, codebook_weight=1.0, entropy_loss_ratio=0.0, commit_loss_beta=0.25, reconstruction_weight=1.0, reconstruction_loss='l2', perceptual_weight=1.0, disc_weight=0.5, disc_start=0, disc_type='patchgan', disc_loss='hinge', gen_loss='hinge', compile=False, dropout_p=0.0, results_dir='results_tokenizer_image', dataset='imagenet', image_size=256, epochs=40, lr=0.0001, weight_decay=0.05, beta1=0.9, beta2=0.95, max_grad_norm=1.0, global_batch_size=8, global_seed=0, num_workers=16, log_every=100, ckpt_every=5000, gradient_accumulation_steps=1, mixed_precision='bf16', rank=0, world_size=1, gpu=0, dist_url='env://', distributed=True, dist_backend='nccl')
[2024-06-13 09:12:10] Starting rank=0, seed=0, world_size=1.
[2024-06-13 09:12:12] VQ Model Parameters: 71,883,403
loaded pretrained LPIPS loss from /home/julian/LlamaGen/tokenizer/tokenizer_image/cache/vgg.pth
[2024-06-13 09:12:22] Discriminator Parameters: 2,765,633
[2024-06-13 09:12:32] Dataset contains 691,040 images (/home/julian/images)
[rank0]: Traceback (most recent call last):
[rank0]:   File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank0]:     return _run_code(code, main_globals, None,
[rank0]:   File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
[rank0]:     exec(code, run_globals)
[rank0]:   File "/home/julian/LlamaGen/tokenizer/tokenizer_image/vq_train.py", line 316, in <module>
[rank0]:     main(args)
[rank0]:   File "/home/julian/LlamaGen/tokenizer/tokenizer_image/vq_train.py", line 146, in main
[rank0]:     optimizer.load_state_dict(checkpoint["optimizer"])
[rank0]: KeyError: 'optimizer'

Should the huggingface ckpts be updated to include these?

Thanks again

PeizeSun commented 3 weeks ago

Hi~ We will update VQ-VAE model weight within next 24 hours.

Sorry for not considering finetuning.

PeizeSun commented 3 weeks ago

@julian-q VQ-VAE ckpt is updated vq_ds16_c2i_training.pt

julian-q commented 3 weeks ago

thank you so much! 🙌🙌