sxjscience commented 3 years ago

Description

Fix seed for the TVM test, which can be flaky

cc @dmlc/gluon-nlp-team

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1399/improve_test/index.html

codecov[bot] commented 3 years ago

Codecov Report

Merging #1399 into master will increase coverage by 2.96%. The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1399      +/-   ##
==========================================
+ Coverage   82.17%   85.14%   +2.96%     
==========================================
  Files          53       53              
  Lines        6946     6946              
==========================================
+ Hits         5708     5914     +206     
+ Misses       1238     1032     -206

Impacted Files	Coverage Δ
src/gluonnlp/data/loading.py	`78.11% <0.00%> (-3.02%)`	:arrow_down:
src/gluonnlp/models/mobilebert.py	`87.85% <0.00%> (+0.25%)`	:arrow_up:
src/gluonnlp/models/albert.py	`95.42% <0.00%> (+0.35%)`	:arrow_up:
src/gluonnlp/models/transformer.py	`98.93% <0.00%> (+0.42%)`	:arrow_up:
src/gluonnlp/attention_cell.py	`80.70% <0.00%> (+0.78%)`	:arrow_up:
src/gluonnlp/data/tokenizers/yttm.py	`82.75% <0.00%> (+0.86%)`	:arrow_up:
conftest.py	`86.25% <0.00%> (+1.25%)`	:arrow_up:
src/gluonnlp/data/tokenizers/sentencepiece.py	`78.44% <0.00%> (+2.99%)`	:arrow_up:
src/gluonnlp/models/bert.py	`94.80% <0.00%> (+7.26%)`	:arrow_up:
src/gluonnlp/models/gpt2.py	`98.25% <0.00%> (+7.31%)`	:arrow_up:
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8c8b0c9...e9f6475. Read the comment docs.

szha commented 3 years ago

Fixing seed is usually not a good strategy as changes in upstream can still change the order of events and thus the results. If necessary, change the tolerance instead.

sxjscience commented 3 years ago

Actually, the tolerance has been changed to 1E-1 and it can still be flaky. That’s the reason why I fixed the seed. We will need to investigate why it happens in TVM. One possibility is due to some numerical issues caused by the fusion.

Get Outlook for iOShttps://aka.ms/o0ukef

From: Sheng Zha notifications@github.com Sent: Saturday, October 24, 2020 7:33:13 PM To: dmlc/gluon-nlp gluon-nlp@noreply.github.com Cc: Xingjian SHI xshiab@connect.ust.hk; Author author@noreply.github.com Subject: Re: [dmlc/gluon-nlp] [Fix] Improve test (#1399)

Fixing seed is usually not a good strategy as changes in upstream can still change the order of events and thus the results. If necessary, change the tolerance instead.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/dmlc/gluon-nlp/pull/1399#issuecomment-716084026, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABHQH3UUDS3O2BKJNLMP5M3SMOE6TANCNFSM4S56HUYA.

szha commented 3 years ago

This sounds like a real issue. Let’s proceed as you suggested and mark it experimental and note down the known issue

sxjscience commented 3 years ago

I think TVM can still be used and may not be marked as experimental. Basically, the issue happens for the ALBERT network and is quite flaky. We need to spend some time to investigate the issue. I’ve also informed @icemelon9 about that.

Get Outlook for iOShttps://aka.ms/o0ukef

From: Sheng Zha notifications@github.com Sent: Saturday, October 24, 2020 8:06:09 PM To: dmlc/gluon-nlp gluon-nlp@noreply.github.com Cc: Xingjian SHI xshiab@connect.ust.hk; Author author@noreply.github.com Subject: Re: [dmlc/gluon-nlp] [Fix] Improve test (#1399)

This sounds like a real issue. Let’s proceed as you suggested and mark it experimental and note down the known issue

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/dmlc/gluon-nlp/pull/1399#issuecomment-716086553, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABHQH3SRJ4Z74FLCDJTUJODSMOI2DANCNFSM4S56HUYA.

szha commented 3 years ago

We need to take the following into account:

the TVM dependency is on an unreleased commit.
(assuming TVM adopts semantic versioning) TVM hasn’t released a 1.0 yet, which means it doesn’t commit to a stable API.
TVM integration is a new feature that was merged just now.
GluonNLP supports production and needs to provide stability guarantee.
there is known precision issue.

Please reconsider

sxjscience commented 3 years ago

Marking it as experimental is also valid (The MXNet 2.0 might be itself experimental...... ). Nevertheless, we may encourage our users to try out TVM. According to the current benchmark, TVM can offer 1.2x -- 2.4x speed-up over huggingface + torchscript for fp32 inference. Thus, it is a nice feature to add in our package. We can later provide an example about how to convert GluonNLP models to TVM and do inference on different AWS instances, including C4, C5, G4, P3. In addition, the inference latency can largely benefit from new features like Ansor.

sxjscience commented 3 years ago

@szha Changed the README to point out that TVM integration is experimental. We may change it back once TVM 0.8 is ready.

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1399/improve_test/index.html

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1399/improve_test/index.html

dmlc / gluon-nlp

[Fix] Improve test + Move TVM integration to experimental #1399

Description

Codecov Report