Open perdasilva opened 5 years ago
Hey, this is the MXNet Label Bot. Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it. Here are my recommended labels: Test, CI
@mxnet-label-bot add [test] @apeforest
fixed in latest run, we can close this now: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/detail/master/320/pipeline
actually, we can't close it yet, this test was fixed but went back to failing after https://github.com/apache/incubator-mxnet/pull/15059. Similar OOM issue in https://github.com/apache/incubator-mxnet/issues/14980
Currently, both CPU and GPU tests have been disabled due to the same memory issue. Had a discussion with @access2rohit and @apeforest, we can try a few things:
We are having problems testing the above solutions on CI machines that have multiple jobs running in parallel.
failed with 200G shared memory on P3.2x and failed, we need another approach for testing large tensor.
Description
Test Large Tensor: GPU step is failing with:
see http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/detail/master/312/pipeline/144 for the full log