GPT2 tests mysteriously killed

barry-jin commented 3 years ago

Description

GPT2 tests in tests/test_models.py is mysteriously killed. The was found in the recent nightly tests(cu102-2.0.0b20210502 and cu102-2.0.0b20210504). But, after I tried other MXNet nightly build prior to cu102-2.0.0b20210502, the error was still there. So, I suspect that some changes in other upstream packages resulted in this issues. Need more investigation.

Error Message

root@ce877b5a6d9d:/workspace/gluon-nlp# python3 -m pytest --durations=50 --device='cpu' --verbose --runslow tests/test_models.py
Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=1945812666 to reproduce.
==================================================================== test session starts =====================================================================
platform linux -- Python 3.6.9, pytest-6.2.3, py-1.10.0, pluggy-0.13.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /workspace/gluon-nlp, configfile: pytest.ini
plugins: cov-2.11.1, mock-3.6.0, flaky-3.7.0, env-0.6.2
collected 55 items                                                                                                                                           

tests/test_models.py::test_list_backbone_names PASSED                                                                                                  [  1%]
tests/test_models.py::test_get_backbone[ctx0-google_albert_base_v2] PASSED                                                                             [  3%]
tests/test_models.py::test_get_backbone[ctx0-google_albert_large_v2] PASSED                                                                            [  5%]
tests/test_models.py::test_get_backbone[ctx0-google_albert_xlarge_v2] PASSED                                                                           [  7%]
tests/test_models.py::test_get_backbone[ctx0-google_albert_xxlarge_v2] PASSED                                                                          [  9%]
tests/test_models.py::test_get_backbone[ctx0-gluon_en_cased_bert_base_v1] PASSED                                                                       [ 10%]
tests/test_models.py::test_get_backbone[ctx0-google_en_cased_bert_base] PASSED                                                                         [ 12%]
tests/test_models.py::test_get_backbone[ctx0-google_en_cased_bert_large] PASSED                                                                        [ 14%]
tests/test_models.py::test_get_backbone[ctx0-google_en_cased_bert_wwm_large] PASSED                                                                    [ 16%]
tests/test_models.py::test_get_backbone[ctx0-google_en_uncased_bert_base] PASSED                                                                       [ 18%]
tests/test_models.py::test_get_backbone[ctx0-google_en_uncased_bert_large] PASSED                                                                      [ 20%]
tests/test_models.py::test_get_backbone[ctx0-google_en_uncased_bert_wwm_large] PASSED                                                                  [ 21%]
tests/test_models.py::test_get_backbone[ctx0-google_multi_cased_bert_base] PASSED                                                                      [ 23%]
tests/test_models.py::test_get_backbone[ctx0-google_zh_bert_base] PASSED                                                                               [ 25%]
tests/test_models.py::test_get_backbone[ctx0-gluon_electra_small_owt] PASSED                                                                           [ 27%]
tests/test_models.py::test_get_backbone[ctx0-google_electra_base] PASSED                                                                               [ 29%]
tests/test_models.py::test_get_backbone[ctx0-google_electra_large] PASSED                                                                              [ 30%]
tests/test_models.py::test_get_backbone[ctx0-google_electra_small] PASSED                                                                              [ 32%]
tests/test_models.py::test_get_backbone[ctx0-gpt2_124M] SKIPPED (Skipping GPT-2 test)                                                                  [ 34%]
tests/test_models.py::test_get_backbone[ctx0-gpt2_1558M] SKIPPED (Skipping GPT-2 test)                                                                 [ 36%]
tests/test_models.py::test_get_backbone[ctx0-gpt2_355M] Killed

To Reproduce

(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)

Steps to reproduce

(Paste the commands you ran that produced the error.)

1. 2.

What have you tried to solve it?

1. 2.

Environment

We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:

curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python

# paste outputs here

sxjscience commented 3 years ago

Would you try the alpha version?

barry-jin commented 3 years ago

Would you try the alpha version?

Looks like alpha version works well. The issue may be still in upstream(mxnet) changes, I will try to fix it.

sxjscience commented 3 years ago

This might be difficult to fix. You may try to see if running GPT-2 alone will still cause the error.

dmlc / gluon-nlp