I am training on a 512 x 512 set of data, using 4 gpus. I am having a stall during the loss process where it locks during the calculation of the loss. It gives a user warning of "was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector."
Is there a way to split the loss function to work on multiple gpus with the DataParallelwithCallback? I noticed that the loss is only calculating on gpu 0.
I am training on a 512 x 512 set of data, using 4 gpus. I am having a stall during the loss process where it locks during the calculation of the loss. It gives a user warning of "was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector."
Is there a way to split the loss function to work on multiple gpus with the DataParallelwithCallback? I noticed that the loss is only calculating on gpu 0.