apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

windows-cpu flaky tests #20389

Open josephevans opened 3 years ago

josephevans commented 3 years ago

I keep getting the same test failures on windows-cpu (noticed on the v1.9.x and v1.x branches, but haven't checked master.)

Tests failing (only on windows-cpu):

They are all related to random number generation.

======================================================================
[2021-06-25T22:04:20.216Z] ERROR: test_numpy_op.test_np_rand
[2021-06-25T22:04:20.216Z] ----------------------------------------------------------------------
[2021-06-25T22:04:20.216Z] Traceback (most recent call last):
[2021-06-25T22:04:20.216Z]   File "C:\Python37\lib\site-packages\nose\case.py", line 198, in runTest
[2021-06-25T22:04:20.216Z]     self.test(*self.arg)
[2021-06-25T22:04:20.483Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\common.py", line 218, in test_new
[2021-06-25T22:04:20.483Z]     orig_test(*args, **kwargs)
[2021-06-25T22:04:20.483Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\windows_package\python\mxnet\util.py", line 297, in _with_np_shape
[2021-06-25T22:04:20.483Z]     return func(*args, **kwargs)
[2021-06-25T22:04:20.483Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\windows_package\python\mxnet\util.py", line 481, in _with_np_array
[2021-06-25T22:04:20.483Z]     return func(*args, **kwargs)
[2021-06-25T22:04:20.483Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\test_numpy_op.py", line 7300, in test_np_rand
[2021-06-25T22:04:20.483Z]     probs=probs, nsamples=samples, nrepeat=trials)
[2021-06-25T22:04:20.483Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\windows_package\python\mxnet\test_utils.py", line 2364, in verify_generator
[2021-06-25T22:04:20.483Z]     probs=probs, nsamples=nsamples)
[2021-06-25T22:04:20.483Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\windows_package\python\mxnet\test_utils.py", line 2325, in chi_square_check
[2021-06-25T22:04:20.483Z]     _, p = ss.chisquare(f_obs=obs_freq, f_exp=expected_freq)
[2021-06-25T22:04:20.483Z]   File "C:\Python37\lib\site-packages\scipy\stats\stats.py", line 6853, in chisquare
[2021-06-25T22:04:20.483Z]     lambda_="pearson")
[2021-06-25T22:04:20.483Z]   File "C:\Python37\lib\site-packages\scipy\stats\stats.py", line 6694, in power_divergence
[2021-06-25T22:04:20.483Z]     raise ValueError(msg)
[2021-06-25T22:04:20.483Z] ValueError: For each axis slice, the sum of the observed frequencies must agree with the sum of the expected frequencies to a relative tolerance of 1e-08, but the percent differences are:
[2021-06-25T22:04:20.483Z] 0.00022405153185232603
[2021-06-25T22:04:20.483Z] -------------------- >> begin captured logging << --------------------
[2021-06-25T22:04:20.483Z] common: WARNING: Error seen with seeded test, use MXNET_TEST_SEED=375527706 to reproduce.
[2021-06-25T22:04:20.483Z] --------------------- >> end captured logging << ---------------------
[2021-06-25T22:04:20.483Z] 
[2021-06-25T22:04:20.483Z] ======================================================================
[2021-06-25T22:04:20.483Z] ERROR: test_numpy_op.test_np_randint
[2021-06-25T22:04:20.483Z] ----------------------------------------------------------------------
[2021-06-25T22:04:20.483Z] Traceback (most recent call last):
[2021-06-25T22:04:20.483Z]   File "C:\Python37\lib\site-packages\nose\case.py", line 198, in runTest
[2021-06-25T22:04:20.483Z]     self.test(*self.arg)
[2021-06-25T22:04:20.483Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\common.py", line 218, in test_new
[2021-06-25T22:04:20.483Z]     orig_test(*args, **kwargs)
[2021-06-25T22:04:20.483Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\windows_package\python\mxnet\util.py", line 297, in _with_np_shape
[2021-06-25T22:04:20.483Z]     return func(*args, **kwargs)
[2021-06-25T22:04:20.483Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\windows_package\python\mxnet\util.py", line 481, in _with_np_array
[2021-06-25T22:04:20.483Z]     return func(*args, **kwargs)
[2021-06-25T22:04:20.483Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\test_numpy_op.py", line 3531, in test_np_randint
[2021-06-25T22:04:20.483Z]     verify_generator(generator=generator_mx, buckets=buckets, probs=probs, nrepeat=100)
[2021-06-25T22:04:20.483Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\windows_package\python\mxnet\test_utils.py", line 2364, in verify_generator
[2021-06-25T22:04:20.483Z]     probs=probs, nsamples=nsamples)
[2021-06-25T22:04:20.483Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\windows_package\python\mxnet\test_utils.py", line 2325, in chi_square_check
[2021-06-25T22:04:20.483Z]     _, p = ss.chisquare(f_obs=obs_freq, f_exp=expected_freq)
[2021-06-25T22:04:20.483Z]   File "C:\Python37\lib\site-packages\scipy\stats\stats.py", line 6853, in chisquare
[2021-06-25T22:04:20.483Z]     lambda_="pearson")
[2021-06-25T22:04:20.483Z]   File "C:\Python37\lib\site-packages\scipy\stats\stats.py", line 6694, in power_divergence
[2021-06-25T22:04:20.483Z]     raise ValueError(msg)
[2021-06-25T22:04:20.483Z] ValueError: For each axis slice, the sum of the observed frequencies must agree with the sum of the expected frequencies to a relative tolerance of 1e-08, but the percent differences are:
[2021-06-25T22:04:20.483Z] 1.000001000001e-06
[2021-06-25T22:04:20.483Z] -------------------- >> begin captured logging << --------------------
[2021-06-25T22:04:20.483Z] common: WARNING: Error seen with seeded test, use MXNET_TEST_SEED=1166992666 to reproduce.
[2021-06-25T22:04:20.483Z] --------------------- >> end captured logging << ---------------------
[2021-06-25T22:04:20.483Z] 
[2021-06-25T22:04:20.483Z] ======================================================================
[2021-06-25T22:04:20.483Z] ERROR: test_random.test_normal_generator
[2021-06-25T22:04:20.483Z] ----------------------------------------------------------------------
[2021-06-25T22:04:20.483Z] Traceback (most recent call last):
[2021-06-25T22:04:20.483Z]   File "C:\Python37\lib\site-packages\nose\case.py", line 198, in runTest
[2021-06-25T22:04:20.483Z]     self.test(*self.arg)
[2021-06-25T22:04:20.483Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\common.py", line 218, in test_new
[2021-06-25T22:04:20.483Z]     orig_test(*args, **kwargs)
[2021-06-25T22:04:20.483Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\test_random.py", line 621, in test_normal_generator
[2021-06-25T22:04:20.483Z]     nsamples=samples, nrepeat=trials)
[2021-06-25T22:04:20.483Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\windows_package\python\mxnet\test_utils.py", line 2364, in verify_generator
[2021-06-25T22:04:20.483Z]     probs=probs, nsamples=nsamples)
[2021-06-25T22:04:20.483Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\windows_package\python\mxnet\test_utils.py", line 2325, in chi_square_check
[2021-06-25T22:04:20.483Z]     _, p = ss.chisquare(f_obs=obs_freq, f_exp=expected_freq)
[2021-06-25T22:04:20.483Z]   File "C:\Python37\lib\site-packages\scipy\stats\stats.py", line 6853, in chisquare
[2021-06-25T22:04:20.483Z]     lambda_="pearson")
[2021-06-25T22:04:20.750Z]   File "C:\Python37\lib\site-packages\scipy\stats\stats.py", line 6694, in power_divergence
[2021-06-25T22:04:20.750Z]     raise ValueError(msg)
[2021-06-25T22:04:20.751Z] ValueError: For each axis slice, the sum of the observed frequencies must agree with the sum of the expected frequencies to a relative tolerance of 1e-08, but the percent differences are:
[2021-06-25T22:04:20.751Z] 4.000016000064e-06
[2021-06-25T22:04:20.751Z] -------------------- >> begin captured logging << --------------------
[2021-06-25T22:04:20.751Z] common: WARNING: Error seen with seeded test, use MXNET_TEST_SEED=576113501 to reproduce.
[2021-06-25T22:04:20.751Z] --------------------- >> end captured logging << ---------------------
[2021-06-25T22:04:20.751Z] 
[2021-06-25T22:04:20.751Z] ======================================================================
[2021-06-25T22:04:20.751Z] ERROR: test_random.test_uniform_generator
[2021-06-25T22:04:20.751Z] ----------------------------------------------------------------------
[2021-06-25T22:04:20.751Z] Traceback (most recent call last):
[2021-06-25T22:04:20.751Z]   File "C:\Python37\lib\site-packages\nose\case.py", line 198, in runTest
[2021-06-25T22:04:20.751Z]     self.test(*self.arg)
[2021-06-25T22:04:20.751Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\common.py", line 218, in test_new
[2021-06-25T22:04:20.751Z]     orig_test(*args, **kwargs)
[2021-06-25T22:04:20.751Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\test_random.py", line 640, in test_uniform_generator
[2021-06-25T22:04:20.751Z]     verify_generator(generator=generator_mx, buckets=buckets, probs=probs)
[2021-06-25T22:04:20.751Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\windows_package\python\mxnet\test_utils.py", line 2364, in verify_generator
[2021-06-25T22:04:20.751Z]     probs=probs, nsamples=nsamples)
[2021-06-25T22:04:20.751Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\windows_package\python\mxnet\test_utils.py", line 2325, in chi_square_check
[2021-06-25T22:04:20.751Z]     _, p = ss.chisquare(f_obs=obs_freq, f_exp=expected_freq)
[2021-06-25T22:04:20.751Z]   File "C:\Python37\lib\site-packages\scipy\stats\stats.py", line 6853, in chisquare
[2021-06-25T22:04:20.751Z]     lambda_="pearson")
[2021-06-25T22:04:20.751Z]   File "C:\Python37\lib\site-packages\scipy\stats\stats.py", line 6694, in power_divergence
[2021-06-25T22:04:20.751Z]     raise ValueError(msg)
[2021-06-25T22:04:20.751Z] ValueError: For each axis slice, the sum of the observed frequencies must agree with the sum of the expected frequencies to a relative tolerance of 1e-08, but the percent differences are:
[2021-06-25T22:04:20.751Z] 0.0001320175583352586
[2021-06-25T22:04:20.751Z] -------------------- >> begin captured logging << --------------------
[2021-06-25T22:04:20.751Z] common: WARNING: Error seen with seeded test, use MXNET_TEST_SEED=2052855110 to reproduce.
[2021-06-25T22:04:20.751Z] --------------------- >> end captured logging << ---------------------
[2021-06-25T22:04:20.751Z] 
[2021-06-25T22:04:20.751Z] ======================================================================
[2021-06-25T22:04:20.751Z] ERROR: test_random.test_poisson_generator
[2021-06-25T22:04:20.751Z] ----------------------------------------------------------------------
[2021-06-25T22:04:20.751Z] Traceback (most recent call last):
[2021-06-25T22:04:20.751Z]   File "C:\Python37\lib\site-packages\nose\case.py", line 198, in runTest
[2021-06-25T22:04:20.751Z]     self.test(*self.arg)
[2021-06-25T22:04:20.751Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\common.py", line 218, in test_new
[2021-06-25T22:04:20.751Z]     orig_test(*args, **kwargs)
[2021-06-25T22:04:20.751Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\test_random.py", line 684, in test_poisson_generator
[2021-06-25T22:04:20.751Z]     verify_generator(generator=generator_mx, buckets=buckets, probs=probs)
[2021-06-25T22:04:20.751Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\windows_package\python\mxnet\test_utils.py", line 2364, in verify_generator
[2021-06-25T22:04:20.751Z]     probs=probs, nsamples=nsamples)
[2021-06-25T22:04:20.751Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\windows_package\python\mxnet\test_utils.py", line 2325, in chi_square_check
[2021-06-25T22:04:20.751Z]     _, p = ss.chisquare(f_obs=obs_freq, f_exp=expected_freq)
[2021-06-25T22:04:20.751Z]   File "C:\Python37\lib\site-packages\scipy\stats\stats.py", line 6853, in chisquare
[2021-06-25T22:04:20.751Z]     lambda_="pearson")
[2021-06-25T22:04:20.751Z]   File "C:\Python37\lib\site-packages\scipy\stats\stats.py", line 6694, in power_divergence
[2021-06-25T22:04:20.751Z]     raise ValueError(msg)
[2021-06-25T22:04:20.751Z] ValueError: For each axis slice, the sum of the observed frequencies must agree with the sum of the expected frequencies to a relative tolerance of 1e-08, but the percent differences are:
[2021-06-25T22:04:20.751Z] 1.000001000001e-06
[2021-06-25T22:04:20.751Z] -------------------- >> begin captured logging << --------------------
[2021-06-25T22:04:20.751Z] common: WARNING: Error seen with seeded test, use MXNET_TEST_SEED=563978758 to reproduce.
[2021-06-25T22:04:20.751Z] --------------------- >> end captured logging << ---------------------
[2021-06-25T22:04:20.751Z] 
[2021-06-25T22:04:20.751Z] ======================================================================
[2021-06-25T22:04:20.751Z] ERROR: test_random.test_negative_binomial_generator
[2021-06-25T22:04:20.751Z] ----------------------------------------------------------------------
[2021-06-25T22:04:20.751Z] Traceback (most recent call last):
[2021-06-25T22:04:20.751Z]   File "C:\Python37\lib\site-packages\nose\case.py", line 198, in runTest
[2021-06-25T22:04:20.751Z]     self.test(*self.arg)
[2021-06-25T22:04:20.751Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\common.py", line 218, in test_new
[2021-06-25T22:04:20.751Z]     orig_test(*args, **kwargs)
[2021-06-25T22:04:20.751Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\test_random.py", line 702, in test_negative_binomial_generator
[2021-06-25T22:04:20.751Z]     verify_generator(generator=generator_mx, buckets=buckets, probs=probs)
[2021-06-25T22:04:21.019Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\windows_package\python\mxnet\test_utils.py", line 2364, in verify_generator
[2021-06-25T22:04:21.019Z]     probs=probs, nsamples=nsamples)
[2021-06-25T22:04:21.019Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\windows_package\python\mxnet\test_utils.py", line 2325, in chi_square_check
[2021-06-25T22:04:21.019Z]     _, p = ss.chisquare(f_obs=obs_freq, f_exp=expected_freq)
[2021-06-25T22:04:21.019Z]   File "C:\Python37\lib\site-packages\scipy\stats\stats.py", line 6853, in chisquare
[2021-06-25T22:04:21.019Z]     lambda_="pearson")
[2021-06-25T22:04:21.019Z]   File "C:\Python37\lib\site-packages\scipy\stats\stats.py", line 6694, in power_divergence
[2021-06-25T22:04:21.019Z]     raise ValueError(msg)
[2021-06-25T22:04:21.019Z] ValueError: For each axis slice, the sum of the observed frequencies must agree with the sum of the expected frequencies to a relative tolerance of 1e-08, but the percent differences are:
[2021-06-25T22:04:21.019Z] 1.000001000001e-06
[2021-06-25T22:04:21.019Z] -------------------- >> begin captured logging << --------------------
[2021-06-25T22:04:21.019Z] common: WARNING: Error seen with seeded test, use MXNET_TEST_SEED=1930122576 to reproduce.
[2021-06-25T22:04:21.019Z] --------------------- >> end captured logging << ---------------------
[2021-06-25T22:04:21.019Z] 
[2021-06-25T22:04:21.019Z] ======================================================================
[2021-06-25T22:04:21.019Z] ERROR: test_random.test_multinomial_generator
[2021-06-25T22:04:21.019Z] ----------------------------------------------------------------------
[2021-06-25T22:04:21.019Z] Traceback (most recent call last):
[2021-06-25T22:04:21.019Z]   File "C:\Python37\lib\site-packages\nose\case.py", line 198, in runTest
[2021-06-25T22:04:21.019Z]     self.test(*self.arg)
[2021-06-25T22:04:21.019Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\common.py", line 218, in test_new
[2021-06-25T22:04:21.019Z]     orig_test(*args, **kwargs)
[2021-06-25T22:04:21.019Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\test_random.py", line 754, in test_multinomial_generator
[2021-06-25T22:04:21.019Z]     nsamples=samples, nrepeat=trials, success_rate=0.20)
[2021-06-25T22:04:21.019Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\windows_package\python\mxnet\test_utils.py", line 2364, in verify_generator
[2021-06-25T22:04:21.019Z]     probs=probs, nsamples=nsamples)
[2021-06-25T22:04:21.019Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\windows_package\python\mxnet\test_utils.py", line 2325, in chi_square_check
[2021-06-25T22:04:21.019Z]     _, p = ss.chisquare(f_obs=obs_freq, f_exp=expected_freq)
[2021-06-25T22:04:21.019Z]   File "C:\Python37\lib\site-packages\scipy\stats\stats.py", line 6853, in chisquare
[2021-06-25T22:04:21.019Z]     lambda_="pearson")
[2021-06-25T22:04:21.019Z]   File "C:\Python37\lib\site-packages\scipy\stats\stats.py", line 6694, in power_divergence
[2021-06-25T22:04:21.019Z]     raise ValueError(msg)
[2021-06-25T22:04:21.019Z] ValueError: For each axis slice, the sum of the observed frequencies must agree with the sum of the expected frequencies to a relative tolerance of 1e-08, but the percent differences are:
[2021-06-25T22:04:21.019Z] 3.000009000027e-06
[2021-06-25T22:04:21.019Z] -------------------- >> begin captured logging << --------------------
[2021-06-25T22:04:21.019Z] common: WARNING: Error seen with seeded test, use MXNET_TEST_SEED=413498244 to reproduce.
[2021-06-25T22:04:21.019Z] --------------------- >> end captured logging << ---------------------
[2021-06-25T22:04:21.019Z] 
[2021-06-25T22:04:21.019Z] ======================================================================
[2021-06-25T22:04:21.019Z] ERROR: test_random.test_randint_generator
[2021-06-25T22:04:21.019Z] ----------------------------------------------------------------------
[2021-06-25T22:04:21.019Z] Traceback (most recent call last):
[2021-06-25T22:04:21.019Z]   File "C:\Python37\lib\site-packages\nose\case.py", line 198, in runTest
[2021-06-25T22:04:21.019Z]     self.test(*self.arg)
[2021-06-25T22:04:21.019Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\common.py", line 218, in test_new
[2021-06-25T22:04:21.019Z]     orig_test(*args, **kwargs)
[2021-06-25T22:04:21.019Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\test_random.py", line 1013, in test_randint_generator
[2021-06-25T22:04:21.019Z]     verify_generator(generator=generator_mx, buckets=buckets, probs=probs, nrepeat=100)
[2021-06-25T22:04:21.019Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\windows_package\python\mxnet\test_utils.py", line 2364, in verify_generator
[2021-06-25T22:04:21.019Z]     probs=probs, nsamples=nsamples)
[2021-06-25T22:04:21.019Z]   File "C:\jenkins_slave\workspace\ut-python-cpu\windows_package\python\mxnet\test_utils.py", line 2325, in chi_square_check
[2021-06-25T22:04:21.019Z]     _, p = ss.chisquare(f_obs=obs_freq, f_exp=expected_freq)
[2021-06-25T22:04:21.019Z]   File "C:\Python37\lib\site-packages\scipy\stats\stats.py", line 6853, in chisquare
[2021-06-25T22:04:21.019Z]     lambda_="pearson")
[2021-06-25T22:04:21.019Z]   File "C:\Python37\lib\site-packages\scipy\stats\stats.py", line 6694, in power_divergence
[2021-06-25T22:04:21.019Z]     raise ValueError(msg)
[2021-06-25T22:04:21.019Z] ValueError: For each axis slice, the sum of the observed frequencies must agree with the sum of the expected frequencies to a relative tolerance of 1e-08, but the percent differences are:
[2021-06-25T22:04:21.019Z] 1.000001000001e-06
josephevans commented 3 years ago

@TristonC mentioned seeing test failures with random number generation when updating to scipy 1.7.0. He reverted to 1.6.3 and the tests were passing.

Scipy was released recently, and on windows CI, we install the python dependencies from tests/requirements.txt, which does not specify a specific version (thus was installing the latest, 1.7.0) I have changed this to scipy<1.7.0 in https://github.com/apache/incubator-mxnet/pull/20428 to use scipy 1.6.3 instead.

For linux CI pipelines, we hardcode scipy==1.2.1 in ci/docker/install/requirements, but in ci/docker/install/ubuntu_caffe.sh, we install requirements (one at a time) from the Caffe repository. This is inadvertently upgrading scipy to 1.7.0. I have disabled installing the Caffe requirements file also in https://github.com/apache/incubator-mxnet/pull/20428, to prevent this.