NVlabs / LSGM

The Official PyTorch Implementation of "LSGM: Score-based Generative Modeling in Latent Space" (NeurIPS 2021)
Other
340 stars 49 forks source link

Apex installation is successful, but training LSGM (best FID) faield #15

Open fikry102 opened 8 months ago

fikry102 commented 8 months ago

After installing NVIDIA Apex, it shows: training LSGM (best FID):

from apex.optimizers import FusedAdam
AttributeError: module 'torch.distributed' has no attribute '_all_gather_base'

After running "pip uninstall apex", everything is okay.

I use the following command to train LSGM (best FID): (not 2 nodes) python train_vada.py --fid_dir fid_stats_dir --data data/cifar10 --root checkpoints --save cifar10_vae2/lsgm2 --vae_checkpoint cifar10_vae2/vae2/checkpoint.pt --train_vae --custom_conv_dae --apply_sqrt2_res --fir --cont_kl_anneal --dae_arch ncsnpp --embedding_scale 1000 --dataset cifar10 --learning_rate_dae 1e-4 --learning_rate_min_dae 1e-4 --epochs 1875 --dropout 0.2 --batch_size 16 --num_channels_dae 512 --num_scales_dae 3 --num_cell_per_scale_dae 8 --sde_type vpsde --beta_start 0.1 --beta_end 20.0 --sigma2_0 0.0 --weight_decay_norm_dae 1e-2 --weight_decay_norm_vae 1e-2 --time_eps 0.01 --train_ode_eps 1e-6 --eval_ode_eps 1e-6 --train_ode_solver_tol 1e-5 --eval_ode_solver_tol 1e-5 --iw_sample_p drop_all_iw --iw_sample_q reweight_p_samples --num_process_per_node 8 --use_se