dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.56k stars 538 forks source link

[MODEL] update hybrid_forward to use deferred compute #1356

Closed szha closed 3 years ago

szha commented 3 years ago

Description

update hybrid_forward to use deferred compute

Checklist

Essentials

Changes

cc @dmlc/gluon-nlp-team

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-dev.s3-accelerate.amazonaws.com/PR1356/forward/index.html

szha commented 3 years ago

At the moment this is still blocked by https://github.com/apache/incubator-mxnet/issues/19286

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-dev.s3-accelerate.amazonaws.com/PR1356/forward/index.html

sxjscience commented 3 years ago

What's the status of this PR?

szha commented 3 years ago

blocked on the issue of GPT2 as communicated to you prior offline..

sxjscience commented 3 years ago

Are there issues other than the GPT-2 ?

szha commented 3 years ago

no. all other checks have passed.

leezu commented 3 years ago

GPT2 test case to be unblocked by https://github.com/apache/incubator-mxnet/pull/19374

codecov[bot] commented 3 years ago

Codecov Report

Merging #1356 into master will decrease coverage by 0.06%. The diff coverage is 88.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1356      +/-   ##
==========================================
- Coverage   85.31%   85.24%   -0.07%     
==========================================
  Files          53       53              
  Lines        6946     6954       +8     
==========================================
+ Hits         5926     5928       +2     
- Misses       1020     1026       +6     
Impacted Files Coverage Δ
src/gluonnlp/models/electra.py 76.83% <63.33%> (-0.07%) :arrow_down:
src/gluonnlp/models/bart.py 86.25% <80.00%> (ø)
src/gluonnlp/models/albert.py 95.45% <82.60%> (+0.03%) :arrow_up:
src/gluonnlp/attention_cell.py 80.39% <84.81%> (+0.07%) :arrow_up:
src/gluonnlp/layers.py 85.47% <87.01%> (-1.46%) :arrow_down:
src/gluonnlp/models/bert.py 94.80% <91.30%> (ø)
src/gluonnlp/models/transformer_xl.py 82.44% <92.85%> (-0.08%) :arrow_down:
src/gluonnlp/models/mobilebert.py 87.82% <93.10%> (-0.04%) :arrow_down:
src/gluonnlp/op.py 95.08% <93.93%> (ø)
src/gluonnlp/models/transformer.py 98.94% <97.77%> (+<0.01%) :arrow_up:
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update ccc5b9b...6a50bbc. Read the comment docs.

szha commented 3 years ago

looks like it's still failing for the same issue

leezu commented 3 years ago

looks like it's still failing for the same issue

As there are no new binaries since a few days

leezu commented 3 years ago

It appears the GPU tests still run with an outdated version of MXNet.

I noticed that the MXNet version is fixed in the container:

https://github.com/dmlc/gluon-nlp/blob/46c9d014ac0a022b208ce335cb6ee8a54771bae4/tools/docker/ubuntu18.04-gpu.Dockerfile#L35

Though @barry-jin helped point out that it should later be updated at:

https://github.com/dmlc/gluon-nlp/blob/46c9d014ac0a022b208ce335cb6ee8a54771bae4/tools/docker/gluon_nlp_job.sh#L25-L34

sxjscience commented 3 years ago

Should we add --user to the gluon_nlp_job.sh?

sxjscience commented 3 years ago

LGTM overall, we may try to merge after the pytest has passed.

szha commented 3 years ago

@leezu thanks for the help in unblocking this PR.