dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.55k stars 538 forks source link

[Tests] It takes long time to test full suite of backbone models #1445

Closed barry-jin closed 3 years ago

barry-jin commented 3 years ago

Description

It takes about 6 hours to run pytest on test_models, which has blocked GluonNLP CI. This problem exists on and after MXNet nightly build 20201120.

Error Message

Screen Shot 2020-11-23 at 1 47 45 PM
sxjscience commented 3 years ago

What's added in 20201120?

barry-jin commented 3 years ago

What's added in 20201120?

For current GluonNLP master, it will take 17 min to test backbone models on MXNet nightly build before 20201120, but will take 6 hours to run the same tests on MXNet nightly build on and after 20201120. On Nov 20, I only find one commit https://github.com/apache/incubator-mxnet/commit/13e9d572b3059ebe0d1d4f6d452db4f971375588

leezu commented 3 years ago

nightly build before 20201120

What day is the build?

barry-jin commented 3 years ago

nightly build before 20201120

What day is the build?

I tried both mxnet-cu102==2.0.0b20201118 and mxnet-cu102==2.0.0b20201119, the pytest on test_models will take only 17 min.

To Reproduce

$ cd gluon-nlp
$ python3 ./tools/batch/submit-job.py --region us-east-1 \
                                      --job-type g4dn.4x \
                                      --source-ref npx.savez \
                                      --work-dir . \
                                      --remote https://github.com/leezu/gluon-nlp \
                                      --command "pip3 uninstall -y mxnet-cu102 && python3 -m pip install -U --quiet --pre "mxnet-cu102==2.0.0b20201119" -f https://dist.mxnet.io/python && python3 -m pip install pytest-forked && python3 -m pytest --forked --durations=50 --device="gpu" --verbose --runslow ./tests/test_models.py" \
                                      --wait
leezu commented 3 years ago

Will be fixed by https://github.com/apache/incubator-mxnet/pull/19584