OpenBMB / BMTrain

Efficient Training (including pre-training and fine-tuning) for Big Models
Apache License 2.0
560 stars 77 forks source link

Undo a deletion of detach in previous version #69

Closed Achazwl closed 1 year ago

Achazwl commented 1 year ago

The deletion of that "detach" causes the gradient of hidden states in checkpointing to be saved for a long time.