Open norabelrose opened 1 year ago
Describe the bug Checkpointing crashes when --zero is set, with the error RuntimeError: Tensors must be CUDA and dense being thrown inside the method consolidate_state_dict()
--zero
RuntimeError: Tensors must be CUDA and dense
consolidate_state_dict()
Expected behavior Shouldn't crash
Screenshots
Describe the bug Checkpointing crashes when
--zero
is set, with the errorRuntimeError: Tensors must be CUDA and dense
being thrown inside the methodconsolidate_state_dict()
Expected behavior Shouldn't crash
Screenshots