Open ChaiBapchya opened 4 years ago
@mxnet-label-bot add [ci, windows]
Created an upstream issue: https://github.com/thrust/thrust/issues/1090
@vexilligera did you test if the error also occurs on more recent versions of thrust? I suggest we try installing thrust 1.9.8 version on Windows CI, which is the version that'll be shipped with Cuda 11
We do that on Ubuntu CI already
There is another suggested fix at https://github.com/pytorch/pytorch/issues/25393#issuecomment-619547577
cc @vexilligera
Seems to be a nvcc bug https://github.com/thrust/thrust/issues/1090#issuecomment-626080333
This is indeed an nvcc bug. There is no known workaround at the moment, but the next release of the CUDA toolkit will contain a fix.
Ref thrust/thrust#1090.
Description
Intermittent failure seen on windows-gpu compilation phase (WIN_GPU/WIN_GPU_MKLDNN)
Discovered in this PR : https://github.com/apache/incubator-mxnet/pull/17808
Related to https://github.com/pytorch/pytorch/issues/25393
Error Message
It intermittently gives the error :
Errors:
Entire stack trace: http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/windows-gpu/branches/PR-17808/runs/15/nodes/39/log/?start=0
To Reproduce
Build using Windows AMI and run Clone repo &
py -3 ci/build_windows.py -f WIN_GPU
What have you tried to solve it?
Currently, what is found to work: Introduced max retries = 5