Debian performs regular CI tests on amd64, arm64, and ppc64el (PowerPC) architectures. We've seen three tests fail on the latter two architectures; analysis shows that the failure stems from a slight deviation of actual to the expected value.
I've worked around this in Debian by increasing their relative tolerances (see patch), but I wanted to report this, in case you see this as something worth investigating.
Specifically, these are the failures (note the actual and expected values):
======================================================================
FAIL: Assert that the StackingEstimator worked as expected in scikit-learn pipeline in regression.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/ckk/tpot-0.11.1+dfsg2/tests/stacking_estimator_tests.py", line 113, in test_StackingEstimator_4
assert np.allclose(known_cv_score, cv_score)
AssertionError:
>> assert <module 'numpy' from '/usr/lib/python3/dist-packages/numpy/__init__.py'>.allclose(0.8216045257587923, 0.8207525232725118)
======================================================================
FAIL: Assert that the TPOTRegressor score function outputs a known score for a fixed pipeline.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/ckk/tpot-0.11.1+dfsg2/tests/tpot_tests.py", line 580, in test_score_3
assert np.allclose(known_score, score)
AssertionError:
-------------------- >> begin captured stdout << ---------------------
Warning: xgboost.XGBRegressor is not available and will not be used by TPOT.
--------------------- >> end captured stdout << ----------------------
>> assert <module 'numpy' from '/usr/lib/python3/dist-packages/numpy/__init__.py'>.allclose(-11.708199875921563, -11.96209223601317)
======================================================================
FAIL: Assert that the TPOTRegressor score function outputs a known score for a fixed pipeline with sample weights.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/ckk/tpot-0.11.1+dfsg2/tests/tpot_tests.py", line 633, in test_sample_weight_func
assert np.allclose(known_score, score)
AssertionError:
-------------------- >> begin captured stdout << ---------------------
Warning: xgboost.XGBRegressor is not available and will not be used by TPOT.
--------------------- >> end captured stdout << ----------------------
>> assert <module 'numpy' from '/usr/lib/python3/dist-packages/numpy/__init__.py'>.allclose(-11.586816877933911, -11.583413861528234)
Two are within 1% of the expected value, the third is within 3%.
Would you consider these tolerances acceptable, or are these tests that should absolutely produce the expected value, and nothing else?
Debian performs regular CI tests on amd64, arm64, and ppc64el (PowerPC) architectures. We've seen three tests fail on the latter two architectures; analysis shows that the failure stems from a slight deviation of actual to the expected value.
I've worked around this in Debian by increasing their relative tolerances (see patch), but I wanted to report this, in case you see this as something worth investigating.
Specifically, these are the failures (note the actual and expected values):
Two are within 1% of the expected value, the third is within 3%.
Would you consider these tolerances acceptable, or are these tests that should absolutely produce the expected value, and nothing else?