Closed matt3o closed 1 year ago
Hi @matt3o, from the error message seems your data is on "CPU" but amp is used to automatically choose the precision for "GPU" operations to improve performance while maintaining accuracy. https://pytorch.org/docs/stable/notes/amp_examples.html Hope it helps, thanks!
Describe the bug The crash only occurs during training and AMP has to be on. I got I to run perfectly fine with AMP disabled. I am however not sure if this flag is intended for training at all. From my quick tests the sw_device setting did not appear to be having much advantage in terms of the GPU memory usage compared to just the normal full GPU run. For validation I saw a reduction of the memory usage of about 2 times, so it definitely makes a difference (7Gb vs 3.5Gb).
I don't have the time right now, but I think this should be a reproduceable bug.
Configuration of the Sliding Window Inferer: