Training with Sliding Window Infererer fails if sw_device different than device and AMP is on

Describe the bug The crash only occurs during training and AMP has to be on. I got I to run perfectly fine with AMP disabled. I am however not sure if this flag is intended for training at all. From my quick tests the sw_device setting did not appear to be having much advantage in terms of the GPU memory usage compared to just the normal full GPU run. For validation I saw a reduction of the memory usage of about 2 times, so it definitely makes a difference (7Gb vs 3.5Gb).

I don't have the time right now, but I think this should be a reproduceable bug.

Configuration of the Sliding Window Inferer:

        sw_params = {
            "roi_size": sw_roi_size,
            "mode":"gaussian",
            "cache_roi_weight_map": cache_roi_weight_map,
            "overlap": sw_overlap,
        }

        if sw_cpu_output:
            logger.warning("Enabling Sliding Window output on the CPU")
            sw_params.update({
                "sw_device": device, 
                "device": 'cpu'
            })
        train_inferer = SlidingWindowInferer(
            sw_batch_size=train_batch_size,
            **sw_params
        )

[2023-09-22 19:24:32.944][ERROR] exception_raised - Exception: 
Traceback (most recent call last):
  File "/home/matteo/anaconda3/envs/monai/lib/python3.9/site-packages/ignite/engine/engine.py", line 959, in _internal_run_as_gen
    epoch_time_taken += yield from self._run_once_on_dataset_as_gen()
  File "/home/matteo/anaconda3/envs/monai/lib/python3.9/site-packages/ignite/engine/engine.py", line 1087, in _run_once_on_dataset_as_gen
    self._handle_exception(e)
  File "/home/matteo/anaconda3/envs/monai/lib/python3.9/site-packages/ignite/engine/engine.py", line 636, in _handle_exception
    self._fire_event(Events.EXCEPTION_RAISED, e)
  File "/home/matteo/anaconda3/envs/monai/lib/python3.9/site-packages/ignite/engine/engine.py", line 425, in _fire_event
    func(*first, *(event_args + others), **kwargs)
  File "/home/matteo/anaconda3/envs/monai/lib/python3.9/site-packages/monai/handlers/stats_handler.py", line 203, in exception_raised
    raise e
  File "/home/matteo/anaconda3/envs/monai/lib/python3.9/site-packages/ignite/engine/engine.py", line 1068, in _run_once_on_dataset_as_gen
    self.state.output = self._process_function(self, self.state.batch)
  File "/home/matteo/code/sliding-window-based-interactive-segmentation-of-volumetric-medical-images/src/sw_interactive_segmentation/utils/helper.py", line 282, in timeit_wrapper
    result = func(*args, **kwargs)
  File "/home/matteo/code/sliding-window-based-interactive-segmentation-of-volumetric-medical-images/src/sw_interactive_segmentation/interaction.py", line 234, in __call__
    return engine._iteration(engine, batchdata)  # train network with the final iteration cycle
  File "/home/matteo/anaconda3/envs/monai/lib/python3.9/site-packages/monai/engines/trainer.py", line 229, in _iteration
    engine.scaler.scale(engine.state.output[Keys.LOSS]).backward()
  File "/home/matteo/anaconda3/envs/monai/lib/python3.9/site-packages/torch/cuda/amp/grad_scaler.py", line 164, in scale
    assert outputs.is_cuda or outputs.device.type == 'xla'
AssertionError

================================
Printing MONAI config...
================================
MONAI version: 1.2.0
Numpy version: 1.23.5
Pytorch version: 1.13.0+cu117
MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
MONAI rev id: c33f1ba588ee00229a309000e888f9817b4f1934
MONAI __file__: /home/matteo/anaconda3/envs/monai/lib/python3.9/site-packages/monai/__init__.py

Optional dependencies:
Pytorch Ignite version: 0.4.11
ITK version: 5.3.0
Nibabel version: 5.0.1
scikit-image version: 0.20.0
Pillow version: 9.5.0
Tensorboard version: 2.12.1
gdown version: 4.7.1
TorchVision version: 0.14.0+cu117
tqdm version: 4.64.1
lmdb version: 1.4.0
psutil version: 5.9.4
pandas version: 1.5.3
einops version: 0.6.0
transformers version: 4.21.3
mlflow version: 2.2.2
pynrrd version: 1.0.0

For details about installing the optional dependencies, please visit:
    https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies

================================
Printing system config...
================================
System: Linux
Linux version: Ubuntu 22.04.2 LTS
Platform: Linux-6.2.0-33-generic-x86_64-with-glibc2.35
Processor: x86_64
Machine: x86_64
Python version: 3.9.16
Process name: python
Command: ['python', '-c', 'import monai; monai.config.print_debug_info()']
Open files: []
Num physical CPUs: 12
Num logical CPUs: 24
Num usable CPUs: 24
CPU usage (%): [5.1, 4.5, 4.5, 3.5, 3.5, 4.5, 5.6, 7.5, 4.0, 3.6, 8.0, 7.5, 4.0, 4.5, 4.5, 4.5, 4.5, 6.5, 5.0, 5.0, 4.0, 5.6, 6.0, 99.5]
CPU freq. (MHz): 3841
Load avg. in last 1, 5, 15 mins (%): [0.5, 2.1, 2.9]
Disk usage (%): 98.2
Avg. sensor temp. (Celsius): UNKNOWN for given OS
Total physical memory (GB): 31.2
Available memory (GB): 23.1
Used memory (GB): 7.6

================================
Printing GPU config...
================================
Num GPUs: 1
Has CUDA: True
CUDA version: 11.7
cuDNN enabled: True
cuDNN version: 8500
Current device: 0
Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86']
GPU 0 Name: NVIDIA GeForce RTX 3090 Ti
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 84
GPU 0 Total memory (GB): 22.2
GPU 0 CUDA capability (maj.min): 8.6

Project-MONAI / MONAI

Training with Sliding Window Infererer fails if sw_device different than device and AMP is on #7035