MegEngine / Models

采用MegEngine实现的各种主流深度学习模型
Other
303 stars 99 forks source link

mutliprocess 替换成dist.launcher #99

Closed xgbj closed 3 years ago

xgbj commented 3 years ago

https://github.com/MegEngine/Models/blob/e0908675d028f9803fed88828e1304d8db9706f4/official/vision/classification/resnet/train.py#L110

python自带的多进程在进程退出的时候会有一些问题,有时候会出现训练退出卡住的情况, distributed对它做了一些封装

xgbj commented 3 years ago

看上去问题已经有相关mr了 https://github.com/MegEngine/Models/pull/94