szha commented 3 years ago

Description

update hybrid_forward to use deferred compute

Checklist

Essentials

[x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
[x] Changes are complete (i.e. I finished coding on this PR)
[x] All changes have test coverage
[x] Code is well-documented

Changes

[x] update hybrid_forward to use deferred compute

cc @dmlc/gluon-nlp-team

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-dev.s3-accelerate.amazonaws.com/PR1356/forward/index.html

szha commented 3 years ago

At the moment this is still blocked by https://github.com/apache/incubator-mxnet/issues/19286

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-dev.s3-accelerate.amazonaws.com/PR1356/forward/index.html

sxjscience commented 3 years ago

What's the status of this PR?

szha commented 3 years ago

blocked on the issue of GPT2 as communicated to you prior offline..

sxjscience commented 3 years ago

Are there issues other than the GPT-2 ?

szha commented 3 years ago

no. all other checks have passed.

leezu commented 3 years ago

GPT2 test case to be unblocked by https://github.com/apache/incubator-mxnet/pull/19374

codecov[bot] commented 3 years ago

Codecov Report

Merging #1356 into master will decrease coverage by 0.06%. The diff coverage is 88.75%.

@@            Coverage Diff             @@
##           master    #1356      +/-   ##
==========================================
- Coverage   85.31%   85.24%   -0.07%     
==========================================
  Files          53       53              
  Lines        6946     6954       +8     
==========================================
+ Hits         5926     5928       +2     
- Misses       1020     1026       +6

Impacted Files	Coverage Δ
src/gluonnlp/models/electra.py	`76.83% <63.33%> (-0.07%)`	:arrow_down:
src/gluonnlp/models/bart.py	`86.25% <80.00%> (ø)`
src/gluonnlp/models/albert.py	`95.45% <82.60%> (+0.03%)`	:arrow_up:
src/gluonnlp/attention_cell.py	`80.39% <84.81%> (+0.07%)`	:arrow_up:
src/gluonnlp/layers.py	`85.47% <87.01%> (-1.46%)`	:arrow_down:
src/gluonnlp/models/bert.py	`94.80% <91.30%> (ø)`
src/gluonnlp/models/transformer_xl.py	`82.44% <92.85%> (-0.08%)`	:arrow_down:
src/gluonnlp/models/mobilebert.py	`87.82% <93.10%> (-0.04%)`	:arrow_down:
src/gluonnlp/op.py	`95.08% <93.93%> (ø)`
src/gluonnlp/models/transformer.py	`98.94% <97.77%> (+<0.01%)`	:arrow_up:
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update ccc5b9b...6a50bbc. Read the comment docs.

szha commented 3 years ago

looks like it's still failing for the same issue

leezu commented 3 years ago

looks like it's still failing for the same issue

As there are no new binaries since a few days

leezu commented 3 years ago

It appears the GPU tests still run with an outdated version of MXNet.

I noticed that the MXNet version is fixed in the container:

https://github.com/dmlc/gluon-nlp/blob/46c9d014ac0a022b208ce335cb6ee8a54771bae4/tools/docker/ubuntu18.04-gpu.Dockerfile#L35

Though @barry-jin helped point out that it should later be updated at:

https://github.com/dmlc/gluon-nlp/blob/46c9d014ac0a022b208ce335cb6ee8a54771bae4/tools/docker/gluon_nlp_job.sh#L25-L34

sxjscience commented 3 years ago

Should we add --user to the gluon_nlp_job.sh?

sxjscience commented 3 years ago

LGTM overall, we may try to merge after the pytest has passed.

szha commented 3 years ago

@leezu thanks for the help in unblocking this PR.

dmlc / gluon-nlp

[MODEL] update hybrid_forward to use deferred compute #1356

Description

Checklist

Essentials

Changes

Codecov Report