FP16 and FP32 gives different results since GluonNLP 0.7

davisliang commented 4 years ago

Description

Doing a forward pass with BERT-base (using the same parameters) on fp16 gives very different results from fp32. This is the case for GluonNLP beyond 0.6.0.

Error Message

from float32:

[[[-0.14241205  0.13353725 -0.12907065 ... -0.35967964 -0.05622258
    0.36050138]
  [-0.350648    0.10419771  0.6244457  ... -0.17610289  0.48340237
    0.06443504]
  [-0.24513094 -0.15731683  0.69451946 ... -0.5654467  -0.0894002
   -0.18564378]
  [-0.82478666 -0.9119223  -0.65607107 ...  0.50742483 -0.19388783
   -0.16587636]
  [ 0.8766523   0.03524842 -0.12331446 ...  0.2720161  -0.6369005
   -0.1585012 ]]]
<NDArray 1x5x768 @gpu(0)>

from float16:

[[[-0.4473   0.03326 -0.06555 ... -0.4893  -0.1052   0.5503 ]
  [-0.9287  -0.04443  0.9863  ... -0.7188  -0.1516   0.0721 ]
  [-0.6553  -0.2798   0.6636  ... -0.526   -0.5      0.03748]
  [-0.726   -0.81    -0.05014 ...  0.2372  -0.447    0.04047]
  [-1.035   -0.578    0.5273  ... -0.4065  -0.3872   0.5005 ]]]
<NDArray 1x5x768 @gpu(0)>

To Reproduce

(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)

Steps to reproduce

(Paste the commands you ran that produced the error.)

#float32
import gluonnlp as nlp; import mxnet as mx;
model, vocab = nlp.model.get_model('bert_12_768_12', dataset_name='book_corpus_wiki_en_uncased', use_classifier=False, use_decoder=False, ctx=mx.gpu(0));
tokenizer = nlp.data.BERTTokenizer(vocab, lower=True);
transform = nlp.data.BERTSentenceTransform(tokenizer, max_seq_length=512, pair=False, pad=False);
sample = transform(['Hello world!']);

model.cast('float32')

words, valid_len, segments = mx.nd.array([sample[0]]).as_in_context(mx.gpu(0)), \
                                            mx.nd.array([sample[1]]).as_in_context(mx.gpu(0)).astype('float32'), \
                                            mx.nd.array([sample[2]]).as_in_context(mx.gpu(0)).astype('float32')
seq_encoding, cls_encoding = model(words, segments, valid_len);

# float16
import gluonnlp as nlp; import mxnet as mx;
model, vocab = nlp.model.get_model('bert_12_768_12', dataset_name='book_corpus_wiki_en_uncased', use_classifier=False, use_decoder=False, ctx=mx.gpu(0));
tokenizer = nlp.data.BERTTokenizer(vocab, lower=True);
transform = nlp.data.BERTSentenceTransform(tokenizer, max_seq_length=512, pair=False, pad=False);
sample = transform(['Hello world!']);

model.cast('float16')

words, valid_len, segments = mx.nd.array([sample[0]]).as_in_context(mx.gpu(0)), \
                                            mx.nd.array([sample[1]]).as_in_context(mx.gpu(0)).astype('float16'), \
                                            mx.nd.array([sample[2]]).as_in_context(mx.gpu(0)).astype('float16')
seq_encoding, cls_encoding = model(words, segments, valid_len);

What have you tried to solve it?

Tried it on various gluonnlp versions. 0.6.0 does not have this bug. 0.7.0+ does.

Environment

Just your average EC2 machine with pip install mxnet-cu102

We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:

curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python

# paste outputs here

eric-haibin-lin commented 4 years ago

Did you try to turn on safe accumulation = 1?

davisliang commented 4 years ago

Thanks, Haibin, this solved the issue! Will add a comment to RFC to enable this on default!

dmlc / gluon-nlp