apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.8k forks source link

Flaky test_operator.test_reduce #9845

Open marcoabreu opened 6 years ago

marcoabreu commented 6 years ago

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9841/1/pipeline (test unrelated to change)

======================================================================

FAIL: test_operator.test_reduce

----------------------------------------------------------------------

Traceback (most recent call last):

  File "C:\Anaconda3\envs\py2\lib\site-packages\nose\case.py", line 197, in runTest

    self.test(*self.arg)

  File "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\common.py", line 155, in test_new

    orig_test(*args, **kwargs)

  File "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\test_operator.py", line 1704, in test_reduce

    mx.symbol.max)

  File "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\test_operator.py", line 1679, in test_reduce_inner

    assert equal_backward

AssertionError: 

-------------------- >> begin captured logging << --------------------

common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=297297780 to reproduce.

--------------------- >> end captured logging << ---------------------
anirudhacharya commented 6 years ago

Got the same test failure. Unrelated to the changes in my PR.

Unit test was running on Python2 MKL-DNN CPU

http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/incubator-mxnet/branches/PR-9963/runs/56/nodes/535/log/?start=0

====================================================================== FAIL: test_operator.test_reduce

Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest self.test(self.arg) File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new orig_test(args, **kwargs) File "/work/mxnet/tests/python/unittest/test_operator.py", line 1735, in test_reduce mx.symbol.max) File "/work/mxnet/tests/python/unittest/test_operator.py", line 1710, in test_reduce_inner assert equal_backward AssertionError:

marcoabreu commented 6 years ago

Still recent: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/592/pipeline

marcoabreu commented 6 years ago
======================================================================
FAIL: test_operator.test_reduce
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/ubuntu/experimentals/spidyDev_mxnet/tests/python/unittest/common.py", line 157, in test_new
    orig_test(*args, **kwargs)
  File "/home/ubuntu/experimentals/spidyDev_mxnet/tests/python/unittest/test_operator.py", line 1826, in test_reduce
    mx.symbol.max, test_none_axis=test_none)
  File "/home/ubuntu/experimentals/spidyDev_mxnet/tests/python/unittest/test_operator.py", line 1799, in test_reduce_inner
    assert equal_backward
AssertionError:
-------------------- >> begin captured logging << --------------------
common: INFO: Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=754316987 to reproduce.
common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1117986172 to reproduce.
--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 1 test in 22.731s

FAILED (failures=1)

from https://github.com/apache/incubator-mxnet/issues/10476

ThomasDelteil commented 6 years ago

Happened to me too: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-10628/4/pipeline/656

zheng-da commented 6 years ago

Here as well. http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-10706/2/pipeline/677/

This bug can be reproduced with specific random seeds.

nswamy commented 6 years ago

Ran a few thousand times on Ubuntu-CPU, the test doesn't seem to fail. Also the attached PR seems to have fixed the issue, verified with @marcoabreu that he has not seen this failure recently. Closing this issue as fixed.

haojin2 commented 4 years ago

Occured again: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-16786/4/pipeline/294/

======================================================================

FAIL: test_ndarray.test_reduce

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest

    self.test(*self.arg)

  File "/work/mxnet/tests/python/unittest/common.py", line 177, in test_new

    orig_test(*args, **kwargs)

  File "/work/mxnet/tests/python/unittest/test_ndarray.py", line 647, in test_reduce

    mx.nd.sum, True, allow_almost_equal=True)

  File "/work/mxnet/tests/python/unittest/test_ndarray.py", line 643, in test_reduce_inner

    assert_array_almost_equal(ndarray_ret, numpy_ret, decimal=decimal)

  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/_private/utils.py", line 1017, in assert_array_almost_equal

    precision=decimal)

  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/_private/utils.py", line 829, in assert_array_compare

    raise AssertionError(msg)

AssertionError: 

Arrays are not almost equal to 5 decimals

Mismatch: 100%

Max absolute difference: 1.5258789e-05

Max relative difference: 2.557863e-07

 x: array([[[[[-59.65446]]]]], dtype=float32)

 y: array([[[[[-59.65444]]]]], dtype=float32)

-------------------- >> begin captured logging << --------------------

common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=340881315 to reproduce.

--------------------- >> end captured logging << ---------------------