AlignmentResearch / tuned-lens

Tools for understanding how transformer predictions are built layer-by-layer
https://tuned-lens.readthedocs.io/en/latest/
MIT License
432 stars 47 forks source link

Checkpointing crashes with ZeRO optimizer #96

Open norabelrose opened 1 year ago

norabelrose commented 1 year ago

Describe the bug Checkpointing crashes when --zero is set, with the error RuntimeError: Tensors must be CUDA and dense being thrown inside the method consolidate_state_dict()

Expected behavior Shouldn't crash

Screenshots

Captura de pantalla 2023-05-14 a la(s) 12 01 03 p m