stuck at epoch 0 ,I try to use one or two gpus on single machine, then stucl

ali-vilab / AnyDoor

Official implementations for paper: Anydoor: zero-shot object-level image customization

https://ali-vilab.github.io/AnyDoor-Page/

MIT License

3.94k stars 359 forks source link

stuck at epoch 0 ,I try to use one or two gpus on single machine, then stucl #68

Closed kongzijian closed 8 months ago

kongzijian commented 8 months ago

trainging set is : dataloader = DataLoader(dataset1, num_workers=0, batch_size=batch_size, shuffle=True) from pytorch_lightning.callbacks.progress import TQDMProgressBar trainer = pl.Trainer(gpus=n_gpus, strategy="ddp", precision=16, accelerator="gpu", callbacks=[TQDMProgressBar(refresh_rate=1)], accumulate_grad_batches=accumulate_grad_batches,log_every_n_steps=20)

XavierCHEN34 commented 8 months ago

There may be some bugs for your dataloader. We add "try except" in our dataset base.py to skip some samples. You could remove the try except to debug.