dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.55k stars 538 forks source link

Run GPU tests in individual subprocesses #1409

Closed leezu closed 3 years ago

leezu commented 3 years ago

Description

Following the TVM PR, CI is failing with out of memory issues. Test skipping TVM tests.

Reference of CI running out of memory even with an older MXNet commit: https://github.com/dmlc/gluon-nlp/actions/runs/328560026

codecov[bot] commented 3 years ago

Codecov Report

Merging #1409 into master will increase coverage by 0.25%. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1409      +/-   ##
==========================================
+ Coverage   85.12%   85.38%   +0.25%     
==========================================
  Files          53       53              
  Lines        6946     6946              
==========================================
+ Hits         5913     5931      +18     
+ Misses       1033     1015      -18     
Impacted Files Coverage Δ
src/gluonnlp/attention_cell.py 80.31% <0.00%> (-0.40%) :arrow_down:
src/gluonnlp/data/filtering.py 82.60% <0.00%> (+4.34%) :arrow_up:
src/gluonnlp/data/loading.py 83.39% <0.00%> (+5.28%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7c1927f...ad5d1c9. Read the comment docs.

leezu commented 3 years ago

image

CI passed with the "old" MXNet commit. @barry-jin helped to update the github container and I'll now retrigger this PR to see if it still "fixes" the CI issue with the latest MXNet version

leezu commented 3 years ago

Fails with the updated MXNet commit due to out of gpu memory

[2020-10-29T00:29:33.285Z] tests/test_models.py ............................ssssssssssssssss        [ 57%]
[2020-10-29T00:31:03.325Z] tests/test_models_albert.py .................                            [ 58%]
[2020-10-29T00:32:20.559Z] tests/test_models_bart.py ......                                         [ 59%]
[2020-10-29T00:39:07.063Z] tests/test_models_bert.py ............                                   [ 60%]
[2020-10-29T00:41:36.402Z] tests/test_models_electra.py ........                                    [ 60%]
[2020-10-29T00:46:45.319Z] tests/test_models_gpt2.py .......F                                       [ 61%]
[2020-10-29T00:46:57.180Z] tests/test_models_mobilebert.py .....                                    [ 61%]
[2020-10-29T00:49:06.186Z] tests/test_models_roberta.py ....FF                                      [ 62%]
[2020-10-29T00:49:36.862Z] tests/test_models_transformer.py ....................................... [ 65%]
[2020-10-29T00:50:37.228Z] ........................................................................ [ 71%]
[2020-10-29T00:51:18.688Z] ..........................................FFFFF                          [ 74%]
[2020-10-29T00:51:23.502Z] tests/test_models_transformer_xl.py ......                               [ 75%]
[2020-10-29T00:53:05.532Z] tests/test_models_xlmr.py .FF                                            [ 75%]
[2020-10-29T00:53:05.752Z] tests/test_op.py ....................................................... [ 79%]
[2020-10-29T00:53:06.194Z] ........................................................................ [ 85%]
[2020-10-29T00:53:06.747Z] ....                                                                     [ 85%]
[2020-10-29T00:53:38.587Z] tests/test_optimizer.py .                                                [ 85%]
[2020-10-29T00:53:38.592Z] tests/test_pytest.py .                                                   [ 85%]
[2020-10-29T00:53:38.741Z] tests/test_sequence_sampler.py ......................................... [ 89%]
[2020-10-29T00:53:39.027Z] ........................................................................ [ 94%]
[2020-10-29T00:53:45.583Z] .......................................                                  [ 97%]
github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1409/leezu-patch-1/index.html

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1409/leezu-patch-1/index.html

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1409/leezu-patch-1/index.html

leezu commented 3 years ago

Thanks to @barry-jin to point out that the status displayed in this PR does not take into account the changes in the yml files.

Please refer to https://github.com/dmlc/gluon-nlp/actions/runs/334873655/workflow for the passing tests.

Let's merge this PR to unblock the CI @szha