apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.73k stars 6.81k forks source link

TVMOp doesn't work well with GPU builds #17840

Open apeforest opened 4 years ago

apeforest commented 4 years ago

Description

A few recent PRs failed at the same place related to TVM op.

https://github.com/apache/incubator-mxnet/pull/17835 https://github.com/apache/incubator-mxnet/pull/17795 https://github.com/apache/incubator-mxnet/pull/17531

@yzhliu

leezu commented 4 years ago

@apeforest do you mean this error:

[2020-03-18T08:37:14.025Z] TVMError: Check failed: ret == 0 (-1 vs. 0) : Check failed: f != nullptr: Cannot find function less_scalar_gpufloat32_2bool_2_kernel0 in the imported modules or global registry
apeforest commented 4 years ago

Yes, but it seems to be fixed.

leezu commented 4 years ago

Why would it be fixed? I got this error 2020-03-18T08:37:14.025Z UTC

ChaiBapchya commented 4 years ago

The issue persists. Upon trying unix-gpu build on G4 in CI Dev account. http://jenkins.mxnet-ci-dev.amazon-ml.com/blue/organizations/jenkins/mxnet-validation-bapac%2Funix-gpu/detail/update_gpu_toolchain/8/pipeline/414

all 3 failed tests fail in a similar fashion they fail at 7 tests with the following error

TVMError: Check failed: ret == 0 (-1 vs. 0) : Check failed: f != nullptr: Cannot find function  <x> in the imported modules or global registry

Internal functions that can't be found

greater_equal_gpufloat32_0float32_0bool_0_kernel0 (x2)
logical_and_gpufloat32_1float32_1bool_1_kernel0 (x2)
equal_gpufloat32_2float32_2bool_2_kernel0 (x2)
sum_gpureduce1st_dim_1req_kWriteTobool_5float32_2float32_2_kernel0 (x3)
cuda_rad2degfloat32_2float32_2_kernel0 (x2)

7 Tests that fail as a result

tests/python/unittest/test_numpy_interoperability.py:test_np_array_function_protocol
tests/python/unittest/test_numpy_interoperability.py:test_np_array_ufunc_protocol

tests/python/unittest/test_numpy_ndarray.py:test_np_ndarray_binary_element_wise_ops

tests/python/unittest/test_numpy_op.py:test_np_sum
tests/python/unittest/test_numpy_op.py:test_np_mean
tests/python/unittest/test_numpy_op.py:test_np_unary_funcs
tests/python/unittest/test_numpy_op.py:test_np_binary_funcs
leezu commented 4 years ago

Reproducer Compile MXNet with USE_TVMOP=1.

import mxnet as mx

x = mx.np.array([[0, 1], [1, 1], [2, 2]], ctx=mx.gpu())
idx = x < 2
x[idx]
leezu commented 4 years ago

Has been disabled on CI: https://github.com/apache/incubator-mxnet/pull/18204

Let's track fixing TVMOp in this issue?

yzhliu commented 4 years ago

@JinboCi will be helping