EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
http://epistasislab.github.io/tpot/
GNU Lesser General Public License v3.0
9.73k stars 1.57k forks source link

3 failing test on non-amd64 platforms (precision-related) #1047

Closed ckastner closed 1 year ago

ckastner commented 4 years ago

Debian performs regular CI tests on amd64, arm64, and ppc64el (PowerPC) architectures. We've seen three tests fail on the latter two architectures; analysis shows that the failure stems from a slight deviation of actual to the expected value.

I've worked around this in Debian by increasing their relative tolerances (see patch), but I wanted to report this, in case you see this as something worth investigating.

Specifically, these are the failures (note the actual and expected values):

======================================================================
FAIL: Assert that the StackingEstimator worked as expected in scikit-learn pipeline in regression.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/ckk/tpot-0.11.1+dfsg2/tests/stacking_estimator_tests.py", line 113, in test_StackingEstimator_4
    assert np.allclose(known_cv_score, cv_score)
AssertionError: 
>>  assert <module 'numpy' from '/usr/lib/python3/dist-packages/numpy/__init__.py'>.allclose(0.8216045257587923, 0.8207525232725118)

======================================================================
FAIL: Assert that the TPOTRegressor score function outputs a known score for a fixed pipeline.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/ckk/tpot-0.11.1+dfsg2/tests/tpot_tests.py", line 580, in test_score_3
    assert np.allclose(known_score, score)
AssertionError: 
-------------------- >> begin captured stdout << ---------------------
Warning: xgboost.XGBRegressor is not available and will not be used by TPOT.

--------------------- >> end captured stdout << ----------------------
>>  assert <module 'numpy' from '/usr/lib/python3/dist-packages/numpy/__init__.py'>.allclose(-11.708199875921563, -11.96209223601317)

======================================================================
FAIL: Assert that the TPOTRegressor score function outputs a known score for a fixed pipeline with sample weights.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/ckk/tpot-0.11.1+dfsg2/tests/tpot_tests.py", line 633, in test_sample_weight_func
    assert np.allclose(known_score, score)
AssertionError: 
-------------------- >> begin captured stdout << ---------------------
Warning: xgboost.XGBRegressor is not available and will not be used by TPOT.

--------------------- >> end captured stdout << ----------------------
>>  assert <module 'numpy' from '/usr/lib/python3/dist-packages/numpy/__init__.py'>.allclose(-11.586816877933911, -11.583413861528234)

Two are within 1% of the expected value, the third is within 3%.

Would you consider these tolerances acceptable, or are these tests that should absolutely produce the expected value, and nothing else?

weixuanfu commented 4 years ago

Thank you for digging into this issue. Please submit a PR for increasing their relative tolerances for Debian arm64 and ppc64el (as the patch).

ckastner commented 1 year ago

The fix for this was merged a while ago, so I'm closing the issue.