Closed MoisesHer closed 4 years ago
Job PR-1237/1 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/1/index.html
Job PR-1237/2 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/2/index.html
Job PR-1237/4 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/4/index.html
Job PR-1237/3 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/3/index.html
Job PR-1237/5 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/5/index.html
Merging #1237 into master will increase coverage by
0.03%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## master #1237 +/- ##
==========================================
+ Coverage 87.42% 87.45% +0.03%
==========================================
Files 81 81
Lines 7346 7365 +19
==========================================
+ Hits 6422 6441 +19
Misses 924 924
Impacted Files | Coverage Δ | |
---|---|---|
src/gluonnlp/model/bert.py | 94.65% <0.00%> (+0.03%) |
:arrow_up: |
src/gluonnlp/model/transformer.py | 91.71% <0.00%> (+0.05%) |
:arrow_up: |
src/gluonnlp/model/language_model.py | 98.64% <0.00%> (+0.15%) |
:arrow_up: |
Job PR-1237/6 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/6/index.html
Job PR-1237/7 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/7/index.html
Job PR-1237/8 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/8/index.html
Job PR-1237/9 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/9/index.html
I assume the graph pass requires mxnet nightly build? Would it make sense to mention the minimum mxnet version required for this script in the doc?
Yes, I have added a comment in index.rst for TrueFP16 and custom pass optimizations: " These GPU optimizations require MXNet version 1.7 or higher"
Job PR-1237/10 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/10/index.html
Job PR-1237/11 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/11/index.html
I think maybe there is some dependency error in this code, for the error log below:
[2020-06-20T22:33:01.774Z] In file included from horovod/mxnet/mpi_ops.h:23:0,
[2020-06-20T22:33:01.774Z] from horovod/mxnet/mpi_ops.cc:20:
[2020-06-20T22:33:01.774Z] /var/lib/jenkins/workspace/gluon-nlp-gpu-py3-master@2/conda/gpu/py3-master/lib/python3.5/site-packages/mxnet/include/mxnet/ndarray.h:41:10: fatal error: mkldnn.hpp: No such file or directory
[2020-06-20T22:33:01.774Z] #include <mkldnn.hpp>
[2020-06-20T22:33:01.774Z] ^~~~~~~~~~~~
[2020-06-20T22:33:01.774Z] compilation terminated.
[2020-06-20T22:33:01.774Z] error: command 'gcc' failed with exit status 1
[2020-06-20T22:33:01.774Z] ----------------------------------------
[2020-06-20T22:33:01.774Z] ERROR: Failed building wheel for horovod
Maybe some necessary files need to be installed or included?
Job PR-1237/12 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/12/index.html
@MoisesHer it looks like
bertpass_lib.so
is not built Compilation is triggered here https://github.com/dmlc/gluon-nlp/pull/1237/files#diff-fa82d34d543ff657c2fe09553bd0fa34R433
Locally it works on my side, I think the problem is with conda. Maybe it is not storing the library within the expected path? or not triggering the compilation? What would be the best way to reproduce conda environment?
Job PR-1237/13 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/13/index.html
@MoisesHer you can refer to https://github.com/dmlc/gluon-nlp/blob/master/ci/prepare_clean_env.sh regarding the conda setup
Do you know why I am getting this lint error?
TypeError: '<' not supported between instances of 'str' and 'NoneType'
I cannot reproduce locally. I tried to install miniconda3 and set environment as Leezu suggested (https://github.com/dmlc/gluon-nlp/blob/master/ci/prepare_clean_env.sh), but I cannot reproduce it locally and it does not give any information of what line produces that
@leezu thanks a lot for your help. That allowed me to make some progress.
However, it seems that the lib_api.h
being imported it is not the one contained in the wheel I am using here (https://repo.mxnet.io/dist/python/cu100/mxnet_cu100-1.7.0b20200809-py2.py3-none-manylinux2014_x86_64.whl).
the one included at compilation does not include JsonVal structure, but it is in the wheel (https://github.com/apache/incubator-mxnet/blob/v1.7.x/include/mxnet/lib_api.h#L606)
See https://github.com/dmlc/gluon-nlp/pull/1325 for doc fix
Job PR-1237/34 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/34/index.html
I am not sure about the remaining issue, is it a timeout? if it is, can I avoid it? thanks
@MoisesHer yes I think the current test takes too long. Could you try to reduce the time it takes by potentially reducing the workload?
@MoisesHer yes I think the current test takes too long. Could you try to reduce the time it takes by potentially reducing the workload?
Thanks, another question: is there a way for me to trigger CI checks (without new commit)?
@MoisesHer yes I think the current test takes too long. Could you try to reduce the time it takes by potentially reducing the workload?
Thanks, another question: is there a way for me to trigger CI checks (without new commit)?
Sure, you can just click into Details
of the check to be directed to the jenkins page, and then click Log in
button on the right-upper corner. Then click the Rurun
button(looks like a arrowed circle) on the upper-right corner.
Job PR-1237/37 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/37/index.html
I am confused, not sure why this is failing now: MXNetError: Check failed: compileResult == NVRTC_SUCCESS (6 vs. 0) : NVRTC Compilation failed. Please set environment variable MXNET_USE_FUSION to 0.
I am confused, not sure why this is failing now:
MXNetError: Check failed: compileResult == NVRTC_SUCCESS (6 vs. 0) : NVRTC Compilation failed. Please set environment variable MXNET_USE_FUSION to 0.
Are all those expand_dims expected?
@MoisesHer looks like a compatibility issue. we will address this in a separate PR. thanks for pushing this through!
Description
Includes an script to deploy BERT for QA / classification / regression / embedding tasks It offers the possibility of using available GPU BERT optimizations on MXNet. It reports latency and throughput, and can check accuracy.
Checklist
Essentials
Changes
Comments
cc @dmlc/gluon-nlp-team