apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

Flaky test: test_preloaded_multi_sgd #16345

Open szha opened 5 years ago

szha commented 5 years ago

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-16343/1/pipeline#step-476-log-1053

Likely caused by #16122 in which the test was added.

cc @Caenorst @apeforest

mxnet-label-bot commented 5 years ago

Hey, this is the MXNet Label Bot. Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it. Here are my recommended label(s): Test, Flaky

Caenorst commented 5 years ago

I don't see any absurd difference:

Error 1.013672 exceeds tolerance rtol=0.001000, atol=0.001000.  Location of maximum error:(0, 2, 0, 3), a=-0.203125, b=-0.204346

 a: array([[[[-0.624   , -0.4946  , -0.3997  ,  0.06415 ,  0.2402  ,

          -0.00757 ],

         [-0.03027 , -0.1873  , -0.284   , -0.2961  ,  0.5986  ,...

 b: array([[[[-0.6245   , -0.495    , -0.4001   ,  0.0641   ,  0.2397   ,

          -0.00769  ],

         [-0.03076  , -0.1875   , -0.2844   , -0.2966   ,  0.598    ,...

so I'm suggesting to bump rtol to 5e-3 or 1e-2.

aaronmarkham commented 5 years ago

Failed here too: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-gpu/detail/PR-16344/1/pipeline

ChaiBapchya commented 5 years ago

Failed here too: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-16336/5/pipeline PR #16336

ChaiBapchya commented 5 years ago

@Caenorst can you try bumping it up and then running that particular test say 10k times for unix-gpu using this command:

MXNET_TEST_COUNT=1 nosetests --logging-level=DEBUG --verbose -s tests/python/gpu/test_operator_gpu.py:test_preloaded_multi_sgd

Thanks.

Caenorst commented 5 years ago

It turned out that values very close to 0. are the most inaccurate so bumping atol instead of rtol.