MXNet/Gluon Implementation and Tutorial

google-research / bert

TensorFlow code and pre-trained models for BERT

https://arxiv.org/abs/1810.04805

Apache License 2.0

38.13k stars 9.6k forks source link

MXNet/Gluon Implementation and Tutorial #188

Open eric-haibin-lin opened 5 years ago

eric-haibin-lin commented 5 years ago

Hello all,

We just released a MXNet/Gluon port of BERT in GluonNLP v0.5!

We converted the pre-trained models in TF and it generates the same output as the TF implementation. The BERT model and vocabulary can be downloaded automatically with the get_model() API. Besides we also added a tutorial covering how to fine-tuning with BERT step by step. We have ongoing work of training BERT with multi-GPU and gradient accumulation, and plan to include the recently released BERT models and apply it on other downstream tasks, too.

Here is the link: http://gluon-nlp.mxnet.io/model_zoo/bert/index.html

Thank you @jacobdevlin-google so much for releasing the pre-trained BERT model and code to the research community.

Haibin - GluonNLP team

rfigueror1 commented 5 years ago

Hi Eric,

I am trying to train a classifier based on the fine-tuning tutorial with Bert. I am currently using a p3.2xlarge EC2 GPU instance with the following characteristics:

Single Tesla V100 GPU
GPU Memory 16 Gib (close to 16 GB)
8 vCPUs An the following error is showing up, MXNetError: [21:48:34] src/storage/./pooled_storage_manager.h:143: cudaMalloc failed: out of memory. I followed GPU memory usage and it is using almost 100%. Have there been any updates about mulit-GPU training?

rfigueror1 commented 5 years ago

@eric-haibin-lin

eric-haibin-lin commented 5 years ago

Hi @rfigueror1

Sorry for the late reply. I was busy working on some deadline.

Did you run the tutorial as is, or make some modification (e.g. sequence length) for your own dataset? What version of MXNet did you use? You can get that via pip list | grep mxnet

I'll also try to reproduce your setting and get back to you.

rfigueror1 commented 5 years ago

Hi Eric,

Thanks for the reply, I already solved the issue by decreasing my batch size.

Best