ceruleangu / Block-Sparse-Benchmark

Benchmark for matrix multiplications between dense and block sparse (BSR) matrix in TVM, blocksparse (Gray et al.) and cuSparse.
24 stars 5 forks source link

autoTVM failed and unknown backtrace #1

Open huangteng opened 3 years ago

huangteng commented 3 years ago

Hi I tried to run your example on x86 platform (I simplified your example and changed target to "llvm"), however the autoTVM run failed and poped out low level backtrace, like below:

  99: _PyEval_EvalFrameDefault
        at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:3559
  98: do_call_core
        at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:5034
  97: PyVectorcall_Call
        at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:199
  96: _PyFunction_Vectorcall
        at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:410
  95: function_code_fastcall
        at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:283
  94: _PyEval_EvalFrameDefault
        at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:3486
  93: call_function
        at /tmp/build/80754af9/python_1585235154784/work/Python/ceval.c:4987
  92: _PyObject_Vectorcall
        at /tmp/build/80754af9/python_1585235154784/work/Include/cpython/abstract.h:127
  91: _PyFunction_Vectorcall
        at /tmp/build/80754af9/python_1585235154784/work/Objects/call.c:410
  90: function_code_fastcall

So I really appreciated if you could share some detailed information or any debugging tips:

  1. which version of tvm (commit id)
  2. which version of xgboost

for what I have tried, this small piece of code could trigger the above issue, it seems that if indexing a value from another tensor inside this tensor, the problem happens.

def _compute_basic(*indices):
        n, oc, oh, ow, p = indices
        stride_index = oc * (pattern_set_size + 1) + p
        # here stride is a value tensor
        a = stride[0]
        return data[n, a,  # a triggers the problem
                            oh, ow]
ceruleangu commented 3 years ago

Hi thank you for your message!It's tvm b3b2705 and xgb 1.4.0

Please let me know if that helps:)

huangteng commented 3 years ago

Hi I have been able to reproduce this issue with the simplest demo code, could you please help to take a look at this post on TVM discuss forum and run the sample code that I have uploaded ? Just to make sure that we can reproduce the issue in the same way and have the same understanding. https://discuss.tvm.apache.org/t/backtrace-really-basic-code-triggers-autotvm-exception/9750

huangteng commented 3 years ago

Sorry for closing it by fault, I will also try with the version you mentioned (may take some time). Just to mention that I would like to test whether this dynamic indexing is supported during autoTVM tunning on LLVM x86 platform instead of CUDA. Will be really appreciated if you could try running my simple example on the above post.

huangteng commented 3 years ago

Hi thank you for your message!It's tvm b3b2705 and xgb 1.4.0

Please let me know if that helps:)

I can still reproduce the issue with above versions, with the simple code on the above tvm discuss post

ceruleangu commented 3 years ago

Hi! Sorry for the delay of reply as I was looking into this problem. This is because autotvm uses random input when auto tuning. When the random tensor is used as indices, it causes out-of-bound access for your task. You need to add your task here to set customized input to prevent this issue:

https://github.com/apache/tvm/blob/main/python/tvm/autotvm/measure/measure_methods.py#L584-L587

huangteng commented 3 years ago

Thanks for the hint, indeed the logic enters the condition that scatter is not in the measure_input.task.name. But how to set the customized input in the tune task ? I searched "scatter" in the doc but it is not that obvious. Could you please share a sample code piece ? thanks a lot.

ceruleangu commented 3 years ago

Seems like this feature is missing in the api... could you please try to modify that if scatter statement?

huangteng commented 3 years ago

Seems like this feature is missing in the api... could you please try to modify that if scatter statement?

Yes, by commenting that random_fill part will workaround this issue ... But here comes 2 problems:

  1. How did your block-sparse implementation avoid this kind of problem, it seems not relevant with CPU/GPU, seems sparse-like computations have to depend on this kind of "dynamic indexing", the random fill should trigger the same issue, right ?
  2. If commenting random fill, the default args are all by default zero, even the tunner can run successfully, it does not represent the real running case. So I think there should be a way to parse the real tensor value to the tunner for the tensor initialization and let the tunner measure the correctly time based on the current sparsity. So is there a way to customize the tensor value during the tunning ?
ceruleangu commented 3 years ago

In your case, args[1] is the index tensor. Instead of random fill, you need to generate a random int32 tensor (using numpy.random.randint) and use the shape in build_result.arg_info as the upper bound of random number so that it will not cause out-of-bound access. I don't quite remember my case but I think I did the same thing.

huangteng commented 3 years ago

In your case, args[1] is the index tensor. Instead of random fill, you need to generate a random int32 tensor (using numpy.random.randint) and use the shape in build_result.arg_info as the upper bound of random number so that it will not cause out-of-bound access. I don't quite remember my case but I think I did the same thing.

By the way, my simple code is just used to reproduce error, what I am doing is auto tune a sparse-like computation.

  1. You mean inside that measure_methods.py ?
  2. I think random filling does not represent the actual sparsity, and I think for sparse compute case, it should tune the different schedule templates while keeping the same sparsity, right ? (or else the computation amount will be totally different)
ceruleangu commented 3 years ago
  1. Yes.
  2. Right, it should be tuned with the same sparsity. The sparsity is actually implied by the shape of the index tensor, so when you specify the workload for tuning, it set a fixed sparsity.
huangteng commented 3 years ago

ok, by shape of index might not always be the case, especially when there are more than one tensor are involved into the dynamic compute(ex. my sparse convolution case). Is there a way to pass the pre-customized tensor value to the tunner task ? (Or any other method to pass the value from the higher API level)

ceruleangu commented 3 years ago

I don’t think there are such api in autotvm, you can save precomputed tensors to a file, and load it in measure_method.py. They do things similar in auto scheduler, but it is missing in autotvm :(