dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.56k stars 538 forks source link

loading the pre-trained BERT with (pretrained=True) crashing #1238

Closed Hildweig closed 4 years ago

Hildweig commented 4 years ago

Description

I am trying to do this tutorial https://gluon-nlp.mxnet.io/examples/sentence_embedding/bert.html and whenever I arrive to the 3rd cell I get a crash in collab.

Error Message

Timestamp Level Message
Jun 3, 2020, 2:28:19 PM WARNING WARNING:root:kernel ad3dbdee-4b57-493d-bf27-40be9330ee20 restarted
Jun 3, 2020, 2:28:19 PM INFO KernelRestarter: restarting kernel (1/5), keep random ports
Jun 3, 2020, 2:28:18 PM WARNING [bt] (1) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(mxnet::CopyFromTo(mxnet::NDArray const&, mxnet::NDArray const&, int, bool)+0x6db) [0x7faa2db0f28b]

To Reproduce

!pip install --pre --upgrade mxnet !pip install gluonnlp

import warnings warnings.filterwarnings('ignore')

import io import random import numpy as np import mxnet as mx import gluonnlp as nlp from gluonnlp.calibration import BertLayerCollector

this notebook assumes that all required scripts are already

downloaded from the corresponding tutorial webpage on http://gluon-nlp.mxnet.io

%cd /content/sentence_embedding #this one contains the bert file of the tutorial from bert import data nlp.utils.check_version('0.8.1') np.random.seed(100) random.seed(100) mx.random.seed(10000)

change ctx to mx.cpu() if no GPU is available.

try: ctx = mx.gpu(0) except: ctx = mx.cpu()

Steps to reproduce

after doing those up: the one that makes it crash is this one

bert_base, vocabulary = nlp.model.get_model('bert_12_768_12', dataset_name='book_corpus_wiki_en_uncased', pretrained=True, ctx=ctx, use_pooler=True, use_decoder=False, use_classifier=False)

What have you tried to solve it?

  1. I tried to replace it with this one : bert_base, vocabulary = nlp.model.bert.get_bert_model(model_name='bert_12_768_12', dataset_name='book_corpus_wiki_en_uncased', pretrained=True, ctx=ctx, use_pooler=True, use_decoder=False, use_classifier=False)

  2. Also tried to remove some parameters and I realized that it crashes when pretrained = True, still I need it to be like that.

Environment

----------Python Info---------- Version : 3.6.9 Compiler : GCC 8.4.0 Build : ('default', 'Apr 18 2020 01:56:04') Arch : ('64bit', '') ------------Pip Info----------- Version : 19.3.1 Directory : /usr/local/lib/python3.6/dist-packages/pip ----------MXNet Info----------- Version : 1.6.0 Directory : /usr/local/lib/python3.6/dist-packages/mxnet Num GPUs : 0 Commit Hash : 6eec9da55c5096079355d1f1a5fa58dcf35d6752 ----------System Info---------- Platform : Linux-4.19.104+-x86_64-with-Ubuntu-18.04-bionic system : Linux node : a7a11b36dda7 release : 4.19.104+ version : #1 SMP Wed Feb 19 05:26:34 PST 2020 ----------Hardware Info---------- machine : x86_64 processor : x86_64 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 2 Core(s) per socket: 1 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Model name: Intel(R) Xeon(R) CPU @ 2.30GHz Stepping: 0 CPU MHz: 2300.000 BogoMIPS: 4600.00 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 46080K NUMA node0 CPU(s): 0,1 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat md_clear arch_capabilities ----------Network Test---------- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0029 sec, LOAD: 0.6395 sec. Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0029 sec, LOAD: 0.5304 sec. Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.1030 sec, LOAD: 0.4462 sec. Timing for D2L: http://d2l.ai, DNS: 0.0337 sec, LOAD: 0.3244 sec. Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0170 sec, LOAD: 0.2023 sec. Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0310 sec, LOAD: 0.6803 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0088 sec, LOAD: 0.3790 sec. Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.010742664337158203 sec.

Hildweig commented 4 years ago

I found the error, ctx was equal to null because my mxnet wasn't getting the gpu, so when executing it was crashing. I reinstalled mxnet based on my cuda version which is 10.1 (pip install --upgrade mxnet-cu101 gluonnlp)