Open awaelchli opened 3 months ago
Thanks @awaelchli . The analysis I have done on the code seems to agree that we can remove this function (see https://github.com/Lightning-AI/pytorch-lightning/issues/19955#issuecomment-2232309700). The remaining points I think are of note:
Strategy.teardown()
. The idea is to transfer the optimizer back from the GPU to the CPU. Maybe we don't really care about this since when the fit is complete either the optimizer will be checkpointed or discarded. The questions is if users really depend on this final transfer behavior but I would guess not. This factors into whether we feel we need optimizer_to_cpu
for teardown as mentioned above.Also tagging @janeyx99 for input.
Outline & Motivation
The trainer uses a function
optimizer_to_device
here: https://github.com/Lightning-AI/pytorch-lightning/blob/631911c00413ad028e2887d83eb264cb4822097e/src/lightning/pytorch/strategies/strategy.py#L160-L161In #19955 an issue was raised that the function moved the "step" parameter in the optimizer state to the CUDA device, causing device-to-host syncs during optimizer.step() because the "step" tensor was expected to remain on CPU. #20019 fixed this with special treatment of that key. However, good arguments were made in #19955 that this
optimizer_to_device
shouldn't even be necessary in the first place (https://github.com/Lightning-AI/pytorch-lightning/issues/19955#issuecomment-2197353178).Pitch
Remove
optimizer_to_device
and show that it is redundant by running the tests. We will still need aoptimizer_to_cpu
for teardown.Additional context
No response
cc @justusschock @awaelchli @borda