apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.8k forks source link

Have problom in BERT pre-training: how to training on multiple GPUs #19800

Closed yangshuo0323 closed 3 years ago

yangshuo0323 commented 3 years ago

Description

image

Seek help:

Can I have correct instruction or suggestion ? thanks.

github-actions[bot] commented 3 years ago

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue. Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly. If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

szha commented 3 years ago

This issue is being handled in https://github.com/dmlc/gluon-nlp/issues/1508