dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.55k stars 538 forks source link

v0.x CI is broken #1529

Open leezu opened 3 years ago

leezu commented 3 years ago

We can see the CI will pass seemingly randomly:

fail https://ci.gluon.ai/blue/rest/organizations/jenkins/pipelines/GluonNLP-py3-gpu-integration/branches/PR-1521/runs/11/log/?start=0&download=true/*view*

pass https://ci.gluon.ai/blue/rest/organizations/jenkins/pipelines/GluonNLP-py3-gpu-integration/branches/PR-1521/runs/10/log/?start=0&download=true/*view*

failure log

[2021-02-23T23:41:05.382Z]   -- Detecting C compile features - done
[2021-02-23T23:41:05.382Z]   CMake Error at /var/lib/jenkins/gluon-nlp-gpu-py3/conda/gpu/py3/lib/python3.6/site-packages/cmake/data/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:165 (message):
[2021-02-23T23:41:05.382Z]     Could NOT find Mxnet (missing: Mxnet_LIBRARIES) (Required is at least
[2021-02-23T23:41:05.382Z]     version "1.4.0")
[2021-02-23T23:41:05.382Z]   Call Stack (most recent call first):
[2021-02-23T23:41:05.382Z]     /var/lib/jenkins/gluon-nlp-gpu-py3/conda/gpu/py3/lib/python3.6/site-packages/cmake/data/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:458 (_FPHSA_FAILURE_MESSAGE)
[2021-02-23T23:41:05.382Z]     cmake/Modules/FindMxnet.cmake:54 (find_package_handle_standard_args)
[2021-02-23T23:41:05.382Z]     horovod/mxnet/CMakeLists.txt:12 (find_package)

So far all failing runs were on

ip-172-31-43-211 │ip-172-31-19-212

whereas the successful run was on

ip-172-31-22-205

It may be due to mismatch in instance configuration.

leezu commented 3 years ago

@barry-jin would you have time to backport the Github Actions CI implementation to the v0.x branch? Then we can get rid of all the troubles with Jenkins

barry-jin commented 3 years ago

@leezu Sure.