Mid-Push / Decent

Unpaired Image Translation, Neurips2022
Other
25 stars 3 forks source link

Multi GPU Training Issue #1

Open ShenZheng2000 opened 1 year ago

ShenZheng2000 commented 1 year ago

Hello, authors! Thanks for your excellent work.

I have trouble with multi-GPU training. My command line looks like this:

python train.py --dataroot $dataset_path--name $model_name--gpu 0,1,2,3 --batch_size 1

And the error is below:

Traceback (most recent call last):
  File "/home/shen/Rain/Methods/Decent/train.py", line 49, in <module>
    model.data_dependent_initialize(data)
  File "/home/shen/Rain/Methods/Decent/models/decent_gan_model.py", line 99, in data_dependent_initialize
    self.compute_F_loss().backward()                   # calculate graidents for F
  File "/home/shen/Rain/Methods/Decent/models/decent_gan_model.py", line 189, in compute_F_loss
    assert len(log_prob_a) == self.opt.batch_size * self.opt.num_patches
AssertionError

I print the values below for debugging.

print(f"{len(log_prob_a)} != {self.opt.batch_size} * {self.opt.num_patches}")

which gives me

0 ! = 1 * 256

Since len(log_prob_a) is 0, we get an empty list for log_prob_a in multi-GPU training.

Do you encounter this issue when training your models? How to solve this issue?

Mid-Push commented 1 year ago

Hi,

Hope it is not too late. I have adapted the code, so now it supports Multi-GPU training. You can run the method with

python train.py --dataroot $dataset_path--name $model_name--gpu 0,1,2,3 --batch_size 4

You can also try --var_all or not. The first computes the loss across images, while it computes loss within one image by default.

Best