apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.65k stars 3.45k forks source link

[Tracking Issue] Enabling Testing in AArch64 #10673

Open Mousius opened 2 years ago

Mousius commented 2 years ago

This issue is to track progress enabling tests on AArch64

As part of enabling more tests in the AArch64 container, a number of tests had to be skipped and need to be fixed.

See also: https://github.com/apache/tvm/pull/10677 / https://github.com/apache/tvm/pull/10564

Potential Schedule Issues

xgboost issues

E           xgboost.core.XGBoostError: XGBoost Library (libxgboost.so) could not be loaded.
E           Likely causes:
E             * OpenMP runtime is not installed (vcomp140.dll or libgomp-1.dll for Windows, libomp.dylib for Mac OSX, libgomp.so for Linux and other UNIX-like OSes). Mac OSX users: Run `brew install libomp` to install OpenMP runtime.
E             * You are running 32-bit Python on a 64-bit OS
E           Error message(s): ['/usr/local/lib/python3.7/dist-packages/xgboost/lib/../../xgboost.libs/libgomp-d22c30c5.so.1.0.0: cannot allocate memory in static TLS block']

Unsure

masahi commented 2 years ago

test_topi_conv2d_int8.py::verify_conv2d_NCHWc_int8 was fixed in https://github.com/apache/tvm/pull/10839

leandron commented 2 years ago

When enabling PyTorch and ONNX, I spotted a few more instance of these libgomp relates issues, so I'm adding new tests to the list of skipped tests in AArch64, for further investigation, but in the meanwhile, we guarantee that the other don't regress.

The error message looks like this:

xgboost.core.XGBoostError: XGBoost Library (libxgboost.so) could not be loaded.
Likely causes:
  * OpenMP runtime is not installed (vcomp140.dll or libgomp-1.dll for Windows, libomp.dylib for Mac OSX, libgomp.so for Linux and other UNIX-like OSes). Mac OSX users: Run `brew install libomp` to install OpenMP runtime.
  * You are running 32-bit Python on a 64-bit OS
Error message(s): ['/usr/local/lib/python3.7/dist-packages/xgboost/lib/../../xgboost.libs/libgomp-d22c30c5.so.1.0.0: cannot allocate memory in static TLS block']

Or another version:

def test_guess_frontend_pytorch():
        # some CI environments wont offer pytorch, so skip in case it is not present
>       pytest.importorskip("torch")

tests/python/driver/tvmc/test_frontends.py:79: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.7/dist-packages/torch/__init__.py:198: in <module>
    _load_global_deps()
/usr/local/lib/python3.7/dist-packages/torch/__init__.py:151: in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <CDLL '/usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_global_deps.so', handle 0 at 0xffff1c3e0bd0>
name = '/usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_global_deps.so'
mode = 256, handle = None, use_errno = False, use_last_error = False

    def __init__(self, name, mode=DEFAULT_MODE, handle=None,
                 use_errno=False,
                 use_last_error=False):
        self._name = name
        flags = self._func_flags_
        if use_errno:
            flags |= _FUNCFLAG_USE_ERRNO
        if use_last_error:
            flags |= _FUNCFLAG_USE_LASTERROR
        if _sys.platform.startswith("aix"):
            """When the name contains ".a(" and ends with ")",
               e.g., "libFOO.a(libFOO.so)" - this is taken to be an
               archive(member) syntax for dlopen(), and the mode is adjusted.
               Otherwise, name is presented to dlopen() as a file argument.
            """
            if name and name.endswith(")") and ".a(" in name:
                mode |= ( _os.RTLD_MEMBER | _os.RTLD_NOW )

        class _FuncPtr(_CFuncPtr):
            _flags_ = flags
            _restype_ = self._func_restype_
        self._FuncPtr = _FuncPtr

        if handle is None:
>           self._handle = _dlopen(self._name, mode)
E           OSError: /usr/local/lib/python3.7/dist-packages/torch/lib/libgomp-d22c30c5.so.1: cannot allocate memory in static TLS block

/usr/lib/python3.7/ctypes/__init__.py:364: OSError

In the process of investigating these, I realised that no environment with torch is running integration tests (see https://github.com/apache/tvm/issues/12529), which is also reason for concern that should be fixed.

leandron commented 2 years ago

Just submitted https://github.com/apache/tvm/pull/12554 with the new tests that need skipping, now that I'm testing the environments with Torch installed.