Open szha opened 5 years ago
Hey, this is the MXNet Label Bot. Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it. Here are my recommended label(s): Test, Flaky
I don't see any absurd difference:
Error 1.013672 exceeds tolerance rtol=0.001000, atol=0.001000. Location of maximum error:(0, 2, 0, 3), a=-0.203125, b=-0.204346
a: array([[[[-0.624 , -0.4946 , -0.3997 , 0.06415 , 0.2402 ,
-0.00757 ],
[-0.03027 , -0.1873 , -0.284 , -0.2961 , 0.5986 ,...
b: array([[[[-0.6245 , -0.495 , -0.4001 , 0.0641 , 0.2397 ,
-0.00769 ],
[-0.03076 , -0.1875 , -0.2844 , -0.2966 , 0.598 ,...
so I'm suggesting to bump rtol to 5e-3 or 1e-2.
@Caenorst can you try bumping it up and then running that particular test say 10k times for unix-gpu using this command:
MXNET_TEST_COUNT=1 nosetests --logging-level=DEBUG --verbose -s tests/python/gpu/test_operator_gpu.py:test_preloaded_multi_sgd
Thanks.
It turned out that values very close to 0. are the most inaccurate so bumping atol instead of rtol.
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-16343/1/pipeline#step-476-log-1053
Likely caused by #16122 in which the test was added.
cc @Caenorst @apeforest