Open aaronmarkham opened 4 years ago
Hey, this is the MXNet Label Bot. Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it. Here are my recommended label(s): Test, CI
Unix-gpu kvstore is fixed now, next is looking into other timeouts
Timeout on GPU: CMake TVM_OP OFF http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-16598/2/pipeline/54
CI really needs some attention. I had three PRs yesterday and they all failed on one test or another due to timeouts. Tests need to be broken up or streamlined. It shouldn't take 4 hours for tests to run and then timeout.
I'm flagging the 1.5 GB imagenet model and related tests. I think these should be moved to nightly. https://github.com/apache/incubator-mxnet/blob/master/cpp-package/tests/ci_test.sh#L69-L70
GPU: CUDA10.1+cuDNN7 - 3 hour timeout http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-16500/15/pipeline/46
dist-kvstore tests GPU - 3 hour timeout http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-16512/1/pipeline
Python2: CPU 4 hour timeout http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-16514/1/pipeline/260