danier97 / LDMVFI

[AAAI'2024] "LDMVFI: Video Frame Interpolation with Latent Diffusion Models", Duolikun Danier, Fan Zhang, David Bull
MIT License
131 stars 13 forks source link

Training error when using multiple GPUs #18

Open bo-wang-up opened 3 months ago

bo-wang-up commented 3 months ago

Hi, I encountered an issue where autoencoder training cannot be implemented on multiple GPUs. The training always pauses with a 'Missing logger folder' error, as shown in the image below.

image
danier97 commented 3 months ago

Hi, sorry for the delayed response. Could you please specify

  1. Your PyTorch, cuda, pytorch-lightning versions
  2. The exact command you ran
  3. The full output log

Thanks.

bo-wang-up commented 3 months ago

Thanks for your reply. I have solved the problem. I changed the backend from 'nccl' to 'gloo'