aws / sagemaker-distribution

A set of Docker images that include popular frameworks for machine learning, data science and visualization.
Apache License 2.0
82 stars 45 forks source link

pandas test failures #110

Open just4brown opened 8 months ago

just4brown commented 8 months ago

We're seeing test failures for pandas for versions 2.1.1 and newer. Many of the failures were solved by fixing the test environment in the dockerfile -- see https://github.com/aws/sagemaker-distribution/pull/107

However, there remains three failures still under investigation:

FAILED pandas/tests/frame/test_arithmetic.py::TestFrameFlexArithmetic::test_floordiv_axis0_numexpr_path[python-pow] - AssertionError: DataFrame.iloc[:, 0] (column name="0") are different
FAILED pandas/tests/io/excel/test_openpyxl.py::test_engine_kwargs_append_data_only[True-0-.xlsx] - AssertionError: assert None == 0
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriterEngineTests::test_ExcelWriter_dispatch[OpenpyxlWriter-.xlsx] - IndexError: At least one sheet must be visible
just4brown commented 7 months ago

Newest issue for pandas encountered for any sagemaker-distribution version with pandas v2+:

Will start running test for: pandas.test.Dockerfile against: localhost/sagemaker-distribution:1.1.0-gpu
Built a test image: sha256:9906e3c5cd50fafc3174de6e6d0606963bceff581d2750b3a3bae8fd192e8a03, will now execute its default CMD.
Found Nvidia driver version: 525.85.12
running: pytest -m (not slow and not network and not db) -k (not test_network and not s3 and not test_plain_axes) --no-strict-data-files --ignore pandas/tests/frame/test_arithmetic.py::TestFrameFlexArithmetic::test_floordiv_axis0_numexpr_path /opt/conda/lib/python3.10/site-packages/pandas
============================= test session starts ==============================
platform linux -- Python 3.10.13, pytest-7.4.3, pluggy-1.3.0
rootdir: /opt/conda/lib/python3.10/site-packages/pandas
configfile: pyproject.toml
plugins: anyio-3.7.1, dash-2.14.1, hypothesis-6.88.1, asyncio-0.22.0
asyncio: mode=strict
collected 0 items
INTERNALERROR> Traceback (most recent call last):
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/_pytest/python.py", line 617, in _importtestmodule
INTERNALERROR>     mod = import_path(self.path, mode=importmode, root=self.config.rootpath)
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/_pytest/pathlib.py", line 567, in import_path
INTERNALERROR>     importlib.import_module(module_name)
INTERNALERROR>   File "/opt/conda/lib/python3.10/importlib/__init__.py", line 126, in import_module
INTERNALERROR>     return _bootstrap._gcd_import(name[level:], package, level)
INTERNALERROR>   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
INTERNALERROR>   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
INTERNALERROR>   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
INTERNALERROR>   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
INTERNALERROR>   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
INTERNALERROR>   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/pandas/core/_numba/kernels/__init__.py", line 1, in <module>
INTERNALERROR>     from pandas.core._numba.kernels.mean_ import (
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/pandas/core/_numba/kernels/mean_.py", line 13, in <module>
INTERNALERROR>     import numba
INTERNALERROR> ModuleNotFoundError: No module named 'numba'
INTERNALERROR>
INTERNALERROR> The above exception was the direct cause of the following exception:
INTERNALERROR>
INTERNALERROR> Traceback (most recent call last):
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/_pytest/main.py", line 271, in wrap_session
INTERNALERROR>     session.exitstatus = doit(config, session) or 0
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/_pytest/main.py", line 324, in _main
INTERNALERROR>     config.hook.pytest_collection(session=session)
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/pluggy/_hooks.py", line 493, in __call__
INTERNALERROR>     return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/pluggy/_manager.py", line 115, in _hookexec
INTERNALERROR>     return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/pluggy/_callers.py", line 152, in _multicall
INTERNALERROR>     return outcome.get_result()
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/pluggy/_result.py", line 114, in get_result
INTERNALERROR>     raise exc.with_traceback(exc.__traceback__)
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/pluggy/_callers.py", line 77, in _multicall
INTERNALERROR>     res = hook_impl.function(*args)
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/_pytest/main.py", line 335, in pytest_collection
INTERNALERROR>     session.perform_collect()
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/_pytest/main.py", line 675, in perform_collect
INTERNALERROR>     self.items.extend(self.genitems(node))
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/_pytest/main.py", line 842, in genitems
INTERNALERROR>     rep = collect_one_node(node)
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/_pytest/runner.py", line 546, in collect_one_node
INTERNALERROR>     ihook.pytest_collectstart(collector=collector)
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/pluggy/_hooks.py", line 493, in __call__
INTERNALERROR>     return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/pluggy/_manager.py", line 115, in _hookexec
INTERNALERROR>     return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/pluggy/_callers.py", line 113, in _multicall
INTERNALERROR>     raise exception.with_traceback(exception.__traceback__)
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/pluggy/_callers.py", line 77, in _multicall
INTERNALERROR>     res = hook_impl.function(*args)
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/pytest_asyncio/plugin.py", line 552, in pytest_collectstart
import subprocess
INTERNALERROR>     marks = get_unpacked_marks(collector.obj, consider_mro=True)
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/_pytest/python.py", line 310, in obj
INTERNALERROR>     self._obj = obj = self._getobj()
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/_pytest/python.py", line 528, in _getobj
INTERNALERROR>     return self._importtestmodule()
INTERNALERROR>   File "/opt/conda/lib/python3.10/site-packages/_pytest/python.py", line 642, in _importtestmodule
INTERNALERROR>     raise self.CollectError(
INTERNALERROR> _pytest.nodes.Collector.CollectError: ImportError while importing test module '/opt/conda/lib/python3.10/site-packages/pandas/core/_numba/kernels/__init__.py'.
INTERNALERROR> Hint: make sure your test modules/packages have valid Python names.
INTERNALERROR> Traceback:
INTERNALERROR> ../importlib/__init__.py:126: in import_module
INTERNALERROR>     return _bootstrap._gcd_import(name[level:], package, level)
INTERNALERROR> pandas/core/_numba/kernels/__init__.py:1: in <module>
INTERNALERROR>     from pandas.core._numba.kernels.mean_ import (
INTERNALERROR> pandas/core/_numba/kernels/mean_.py:13: in <module>
INTERNALERROR>     import numba
INTERNALERROR> E   ModuleNotFoundError: No module named 'numba'
claytonparnell commented 5 months ago

Current tests which fail

=========================== short test summary info ============================
FAILED pandas/tests/test_common.py::test_serializable[obj102] - AttributeErro...
FAILED pandas/tests/io/excel/test_openpyxl.py::test_engine_kwargs_append_data_only[True-0-.xlsx]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriterEngineTests::test_ExcelWriter_dispatch[OpenpyxlWriter-.xlsx]
= 3 failed, 213066 passed, 8166 skipped, 6152 deselected, 2425 xfailed, 86 xpassed, 26 warnings in 2468.66s (0:41:08

99.99% success, yet still failure 🥲

claytonparnell commented 3 days ago

Just one failure in 1.9.0-cpu

============================================================================================== short test summary info ===============================================================================================
FAILED pandas/tests/io/test_parquet.py::TestParquetPyArrow::test_filter_row_groups - FutureWarning: Passing 'use_legacy_dataset' is deprecated as of pyarrow 15.0.0 and will be removed in a future version.
================================================ 1 failed, 201762 passed, 19386 skipped, 6152 deselected, 2461 xfailed, 86 xpassed, 28 warnings in 2599.19s (0:43:19) ================================================

Due to E FutureWarning: Passing 'use_legacy_dataset' is deprecated as of pyarrow 15.0.0 and will be removed in a future version. Will need to update now that 1.9.0 has pyarrow 15.0