OpenBMB / BMTrain

Efficient Training (including pre-training and fine-tuning) for Big Models
Apache License 2.0
548 stars 74 forks source link

cuda extention添加的算子不能用bmtrain? #99

Closed westnight closed 1 year ago

westnight commented 1 year ago

我用cuda extention 的方式添加了一个op,用bmtrain框架跑会报OOM,应该是ZeRO没有起效,请问这个问题怎么解决?

Achazwl commented 1 year ago

BMTrain 只对 torch Module 层面做了处理