We are training a DETR model using transformers and it works well on any machine with a GPU+CUDA. Running it on a Mac only works if we use the "cpu" accelerator. With 'mps' it throws an error (see full stack below):
ValueError: boxes1 must be in [x0, y0, x1, y1] (corner) format, but got tensor([], device='mps:0', size=(0, 4))
Using Lightning v2.0.9 on a MacBook Pro M2 Max 64GB
Traceback (most recent call last):
File "/Users/user/dev/project/company/server/ai/od_detr/training/train.py", line 235, in <module>
trainer.fit(model)
File "/Users/user/dev/miniconda3/envs/pytorch2/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 532, in fit
call._call_and_handle_interrupt(
File "/Users/user/dev/miniconda3/envs/pytorch2/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/Users/user/dev/miniconda3/envs/pytorch2/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 571, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/Users/user/dev/miniconda3/envs/pytorch2/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 980, in _run
results = self._run_stage()
File "/Users/user/dev/miniconda3/envs/pytorch2/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1021, in _run_stage
self._run_sanity_check()
File "/Users/user/dev/miniconda3/envs/pytorch2/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1050, in _run_sanity_check
val_loop.run()
File "/Users/user/dev/miniconda3/envs/pytorch2/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 181, in _decorator
return loop_run(self, *args, **kwargs)
File "/Users/user/dev/miniconda3/envs/pytorch2/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 115, in run
self._evaluation_step(batch, batch_idx, dataloader_idx)
File "/Users/user/dev/miniconda3/envs/pytorch2/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 376, in _evaluation_step
output = call._call_strategy_hook(trainer, hook_name, *step_kwargs.values())
File "/Users/user/dev/miniconda3/envs/pytorch2/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 294, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/Users/user/dev/miniconda3/envs/pytorch2/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 393, in validation_step
return self.model.validation_step(*args, **kwargs)
File "/Users/user/dev/project/company/server/ai/od_detr/training/train.py", line 131, in validation_step
loss, loss_dict = self.common_step(batch, batch_idx)
File "/Users/user/dev/project/company/server/ai/od_detr/training/train.py", line 113, in common_step
outputs = self.model(pixel_values=pixel_values, pixel_mask=pixel_mask, labels=labels)
File "/Users/user/dev/miniconda3/envs/pytorch2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/user/dev/miniconda3/envs/pytorch2/lib/python3.10/site-packages/transformers/models/detr/modeling_detr.py", line 1625, in forward
loss_dict = criterion(outputs_loss, labels)
File "/Users/user/dev/miniconda3/envs/pytorch2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/user/dev/miniconda3/envs/pytorch2/lib/python3.10/site-packages/transformers/models/detr/modeling_detr.py", line 2238, in forward
losses.update(self.get_loss(loss, outputs, targets, indices, num_boxes))
File "/Users/user/dev/miniconda3/envs/pytorch2/lib/python3.10/site-packages/transformers/models/detr/modeling_detr.py", line 2208, in get_loss
return loss_map[loss](outputs, targets, indices, num_boxes)
File "/Users/user/dev/miniconda3/envs/pytorch2/lib/python3.10/site-packages/transformers/models/detr/modeling_detr.py", line 2149, in loss_boxes
generalized_box_iou(center_to_corners_format(source_boxes), center_to_corners_format(target_boxes))
File "/Users/user/dev/miniconda3/envs/pytorch2/lib/python3.10/site-packages/transformers/models/detr/modeling_detr.py", line 2410, in generalized_box_iou
raise ValueError(f"boxes1 must be in [x0, y0, x1, y1] (corner) format, but got {boxes1}")
ValueError: boxes1 must be in [x0, y0, x1, y1] (corner) format, but got tensor([], device='mps:0', size=(0, 4))
Environment
Current environment
```
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):
```
Bug description
We are training a DETR model using transformers and it works well on any machine with a GPU+CUDA. Running it on a Mac only works if we use the "cpu" accelerator. With 'mps' it throws an error (see full stack below): ValueError: boxes1 must be in [x0, y0, x1, y1] (corner) format, but got tensor([], device='mps:0', size=(0, 4))
Using Lightning v2.0.9 on a MacBook Pro M2 Max 64GB
What version are you seeing the problem on?
v2.0
How to reproduce the bug
Error messages and logs
Environment
Current environment
``` #- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow): #- PyTorch Lightning Version (e.g., 1.5.0): #- Lightning App Version (e.g., 0.5.2): #- PyTorch Version (e.g., 2.0): #- Python version (e.g., 3.9): #- OS (e.g., Linux): #- CUDA/cuDNN version: #- GPU models and configuration: #- How you installed Lightning(`conda`, `pip`, source): #- Running environment of LightningApp (e.g. local, cloud): ```More info
No response
cc @justusschock