Problem with loss function when training with 512x512 data

AliaksandrSiarohin / motion-cosegmentation

Reference code for "Motion-supervised Co-Part Segmentation" paper

Other

653 stars 142 forks source link

Problem with loss function when training with 512x512 data #56

Open cbarne10 opened 2 years ago

cbarne10 commented 2 years ago

I am training on a 512 x 512 set of data, using 4 gpus. I am having a stall during the loss process where it locks during the calculation of the loss. It gives a user warning of "was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector."

Is there a way to split the loss function to work on multiple gpus with the DataParallelwithCallback? I noticed that the loss is only calculating on gpu 0.

AliaksandrSiarohin commented 2 years ago

I guess bs is too large.

cbarne10 commented 2 years ago

I was using a batch size of 6. Should I lower it? Because it says it even on a batch size as low as 2.