Multi GPU training - Githubissues

dnth commented 3 years ago

Is there a method to train efficientdet models on multi gpu setup?

lgvaz commented 3 years ago

Since we support pytorch-lightning you can use that, just pass the argument gpus=<number> to pl.Trainer.

I've never tried that so far so some errors might pop up? Would you like to try and report back what you find?

dnth commented 3 years ago

So far I have been trying to get multigpu training with fastai model. Here are my codes

model = efficientdet.model('tf_efficientdet_lite0', num_classes=len(class_map), img_size=size)
metrics = [COCOMetric(metric_type=COCOMetricType.bbox)]
learn = efficientdet.fastai.learner(dls=[train_dl, valid_dl], model=model, metrics=metrics, opt_func=ranger)
learn.to_parallel()

Error message pops up when the training starts

raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
ValueError: Target size (torch.Size([4, 128, 128, 27])) must be the same as input size (torch.Size([2, 128, 128, 27]))

I have 2 GPUs. My batch size is set to 4.

My guess would be the error was because the batch was equally divided for each GPU. So each GPU gets 2 images and that doesnt tally with the original batch size of 4.

dnth commented 3 years ago

Following your suggestions above on using the pl.Trainer, I have tried to replace the line in quickstart.ipynb with the following

trainer = pl.Trainer(max_epochs=50, gpus=-1, distributed_backend="dp") Similar error pops up.

GPU available: True, used: True
INFO:lightning:GPU available: True, used: True
TPU available: False, using: 0 TPU cores
INFO:lightning:TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
INFO:lightning:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]

  | Name  | Type          | Params
----------------------------------------
0 | model | DetBenchTrain | 3 M   
INFO:lightning:
  | Name  | Type          | Params
----------------------------------------
0 | model | DetBenchTrain | 3 M   
HBox(children=(HTML(value='Validation sanity check'), FloatProgress(value=1.0, bar_style='info', layout=Layout…
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-f103b8e56be0> in <module>
      1 # trainer = pl.Trainer(max_epochs=50, gpus=1)
      2 trainer = pl.Trainer(max_epochs=50, gpus=-1, distributed_backend="dp")
----> 3 trainer.fit(light_model, train_dl, valid_dl)

~/anaconda3/envs/aceic/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
    442         self.call_hook('on_fit_start')
    443 
--> 444         results = self.accelerator_backend.train()
    445         self.accelerator_backend.teardown()
    446 

~/anaconda3/envs/aceic/lib/python3.8/site-packages/pytorch_lightning/accelerators/dp_accelerator.py in train(self)
    104 
    105         # train or test
--> 106         results = self.train_or_test()
    107 
    108         return results

~/anaconda3/envs/aceic/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py in train_or_test(self)
     72             results = self.trainer.run_test()
     73         else:
---> 74             results = self.trainer.train()
     75         return results
     76 

~/anaconda3/envs/aceic/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in train(self)
    464 
    465     def train(self):
--> 466         self.run_sanity_check(self.get_model())
    467 
    468         self.checkpoint_connector.has_trained = False

~/anaconda3/envs/aceic/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in run_sanity_check(self, ref_model)
    656 
    657             # run eval step
--> 658             _, eval_results = self.run_evaluation(test_mode=False, max_batches=self.num_sanity_val_batches)
    659 
    660             # allow no returns from eval

~/anaconda3/envs/aceic/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in run_evaluation(self, test_mode, max_batches)
    576 
    577                 # lightning module methods
--> 578                 output = self.evaluation_loop.evaluation_step(test_mode, batch, batch_idx, dataloader_idx)
    579                 output = self.evaluation_loop.evaluation_step_end(output)
    580 

~/anaconda3/envs/aceic/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py in evaluation_step(self, test_mode, batch, batch_idx, dataloader_idx)
    169             output = self.trainer.accelerator_backend.test_step(args)
    170         else:
--> 171             output = self.trainer.accelerator_backend.validation_step(args)
    172 
    173         # track batch size for weighted average

~/anaconda3/envs/aceic/lib/python3.8/site-packages/pytorch_lightning/accelerators/dp_accelerator.py in validation_step(self, args)
    122 
    123     def validation_step(self, args):
--> 124         output = self.training_step(args)
    125         return output
    126 

~/anaconda3/envs/aceic/lib/python3.8/site-packages/pytorch_lightning/accelerators/dp_accelerator.py in training_step(self, args)
    118                 output = self.trainer.model(*args)
    119         else:
--> 120             output = self.trainer.model(*args)
    121         return output
    122 

~/anaconda3/envs/aceic/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~/anaconda3/envs/aceic/lib/python3.8/site-packages/pytorch_lightning/overrides/data_parallel.py in forward(self, *inputs, **kwargs)
     85 
     86         replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
---> 87         outputs = self.parallel_apply(replicas, inputs, kwargs)
     88 
     89         if isinstance(outputs[0], Result):

~/anaconda3/envs/aceic/lib/python3.8/site-packages/pytorch_lightning/overrides/data_parallel.py in parallel_apply(self, replicas, inputs, kwargs)
    149 
    150     def parallel_apply(self, replicas, inputs, kwargs):
--> 151         return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
    152 
    153 

~/anaconda3/envs/aceic/lib/python3.8/site-packages/pytorch_lightning/overrides/data_parallel.py in parallel_apply(modules, inputs, kwargs_tup, devices)
    308         output = results[i]
    309         if isinstance(output, Exception):
--> 310             raise output
    311         outputs.append(output)
    312     return outputs

~/anaconda3/envs/aceic/lib/python3.8/site-packages/pytorch_lightning/overrides/data_parallel.py in _worker(i, module, input, kwargs, device)
    267                     fx_called = 'test_step'
    268                 else:
--> 269                     output = module.validation_step(*input, **kwargs)
    270                     fx_called = 'validation_step'
    271 

~/anaconda3/envs/aceic/lib/python3.8/site-packages/icevision/models/efficientdet/lightning/model_adapter.py in validation_step(self, batch, batch_idx)
     42 
     43         with torch.no_grad():
---> 44             raw_preds = self(xb, yb)
     45             preds = efficientdet.convert_raw_predictions(raw_preds["detections"], 0)
     46             loss = efficientdet.loss_fn(raw_preds, yb)

~/anaconda3/envs/aceic/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~/anaconda3/envs/aceic/lib/python3.8/site-packages/icevision/models/efficientdet/lightning/model_adapter.py in forward(self, *args, **kwargs)
     27 
     28     def forward(self, *args, **kwargs):
---> 29         return self.model(*args, **kwargs)
     30 
     31     def training_step(self, batch, batch_idx):

~/anaconda3/envs/aceic/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~/anaconda3/envs/aceic/lib/python3.8/site-packages/effdet/bench.py in forward(self, x, target)
    119                 target['bbox'], target['cls'])
    120 
--> 121         loss, class_loss, box_loss = self.loss_fn(class_out, box_out, cls_targets, box_targets, num_positives)
    122         output = {'loss': loss, 'class_loss': class_loss, 'box_loss': box_loss}
    123         if not self.training:

~/anaconda3/envs/aceic/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~/anaconda3/envs/aceic/lib/python3.8/site-packages/effdet/loss.py in forward(self, cls_outputs, box_outputs, cls_targets, box_targets, num_positives)
    254             l_fn = loss_jit
    255 
--> 256         return l_fn(
    257             cls_outputs, box_outputs, cls_targets, box_targets, num_positives,
    258             num_classes=self.num_classes, alpha=self.alpha, gamma=self.gamma, delta=self.delta,

~/anaconda3/envs/aceic/lib/python3.8/site-packages/effdet/loss.py in loss_fn(cls_outputs, box_outputs, cls_targets, box_targets, num_positives, num_classes, alpha, gamma, delta, box_loss_weight, label_smoothing, new_focal)
    201                 alpha=alpha, gamma=gamma, normalizer=num_positives_sum, label_smoothing=label_smoothing)
    202         else:
--> 203             cls_loss = focal_loss_legacy(
    204                 cls_outputs_at_level, cls_targets_at_level_oh,
    205                 alpha=alpha, gamma=gamma, normalizer=num_positives_sum)

~/anaconda3/envs/aceic/lib/python3.8/site-packages/effdet/loss.py in focal_loss_legacy(logits, targets, alpha, gamma, normalizer)
     39     """
     40     positive_label_mask = targets == 1.0
---> 41     cross_entropy = F.binary_cross_entropy_with_logits(logits, targets.to(logits.dtype), reduction='none')
     42     neg_logits = -1.0 * logits
     43     modulator = torch.exp(gamma * targets * neg_logits - gamma * torch.log1p(torch.exp(neg_logits)))

~/anaconda3/envs/aceic/lib/python3.8/site-packages/torch/nn/functional.py in binary_cross_entropy_with_logits(input, target, weight, size_average, reduce, reduction, pos_weight)
   2578 
   2579     if not (target.size() == input.size()):
-> 2580         raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
   2581 
   2582     return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)

ValueError: Target size (torch.Size([16, 48, 48, 45])) must be the same as input size (torch.Size([8, 48, 48, 45]))

potipot commented 3 years ago

Hi, I experimented a bit with Multi GPU training using both fastai and pytorch lightning. Concerning fastai, it is advisable to switch to newer dependencies fastai==2.2.2 where scripting module is incorporated in the main library. I was able to configure my script and launch it with

python -m fastai.launch --gpus 0,1 my_script.py

but here the problem that occurs is:

File "/home/toucan/anaconda3/envs/icevision/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 73, in default
_collate                                                                                                                          
    return {key: default_collate([d[key] for d in batch]) for key in elem}                                                        
  File "/home/toucan/anaconda3/envs/icevision/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 73, in <dictco
mp>                                                                                                                               
    return {key: default_collate([d[key] for d in batch]) for key in elem}                                                        
  File "/home/toucan/anaconda3/envs/icevision/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 83, in default
_collate                                                                                                                          
    return [default_collate(samples) for samples in transposed]                                                                   
  File "/home/toucan/anaconda3/envs/icevision/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 83, in <listco
mp>                                                                                                                               
    return [default_collate(samples) for samples in transposed]                                                                   
  File "/home/toucan/anaconda3/envs/icevision/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 85, in default
_collate                                                                                                                      
    raise TypeError(default_collate_err_msg_format.format(elem_type))                                                             
TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <class 'icevision.core.bbox.$
Box'>

I managed to fix the error with the following patch:

@patch
def create_batch(self:DataLoader, b):
    return efficientdet.dataloaders.build_train_batch(b)

When running on pytorch lightning I tried with DDP trainer accelerator trainer = pl.Trainer(accelerator='ddp', *args) but then I encountered a memory leak which occurs on validation step, described here: https://github.com/PyTorchLightning/pytorch-lightning/issues/2352 and also wasn't able to resolve it.

potipot commented 3 years ago

I can confirm that Multi-gpu training works with pytorch lightning using DDP accelerator:

trainer = pl.Trainer(max_epochs=10, gpus=[0,1], accelerator='ddp')

The memory leak was coming from COCOMetric accumulation. Will update with fastai distributed training.

lgvaz commented 3 years ago

The memory leak was coming from COCOMetric accumulation.

This is such excellent news :heart:

potipot commented 3 years ago

@lgvaz could you assign this one to me? I should remember to make some tutorials on how to do it :D

lgvaz commented 3 years ago

Here ya go @potipot !!! Thanks for the initiative!

FraPochetti commented 2 years ago

Resolved.

deepwilson commented 2 years ago

Is it possible to use multi-gpu setup with fastai in my jupyter notebook itself?

potipot commented 2 years ago

It is possible but I wouldn't recommend it. AFAIR the only supported multi-gpu paradigm that can work in jupyter notebook is DataParallel. (DP in Pytorch Lightning). This is usually suboptimal and inferior to other paradigms such us DDP or DDP2. These however require to use scripts. from https://pytorch.org/tutorials/intermediate/ddp_tutorial.html:

airctic / icevision

Multi GPU training #581