apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.74k stars 3.47k forks source link

[Bug] [Relax] Segfault error when parse the Relax IR #17239

Open Cookiee235 opened 2 months ago

Cookiee235 commented 2 months ago

Actual behavior

[16:49:04] /software/tvm-lunder/src/runtime/logging.cc:390: TVM_LOG_DEBUG enables VLOG statements in 'ir/transform.cc' up to level 1
[16:49:04] /software/tvm-lunder/src/runtime/logging.cc:390: TVM_LOG_DEBUG enables VLOG statements in 'relay/ir/transform.cc' up to level 1
Segmentation fault (core dumped)

Steps to reproduce

from tvm.script import ir as I
from tvm.script import tir as T
from tvm.script import relax as R

@I.ir_module
class Module:
    @T.prim_func(private=True)
    def multiply_by_two(A: T.Buffer((16,), "float32")):
        for i in range(16):
            A[i] = A[i] * T.float32(2)

    @R.function
    def main(A: R.Tensor((16,), dtype="float32")) -> R.Tensor((16,), dtype="float32"):
        cls = Module
        args: R.Tuple(R.Tensor((16,), dtype="float32")) = (A,)
        gv1: R.Tensor((16,), dtype="float32") = R.call_tir_inplace(cls.multiply_by_two, args, out_sinfo=R.Tensor((16,), dtype="float32"), inplace_indices=[0])
        return gv1
m = Module

cc @Lunderberg @junrushao

Lunderberg commented 2 months ago

Looks like this is a combination of a couple of factors.

  1. Like R.call_tir, the arguments in R.call_tir_inplace must be an in-line relax::Tuple (See the discussion in https://github.com/apache/tvm/pull/15916 for discussion on this requirement.)
  2. The argument provided to R.call_tir_inplace is wrapped into an in-line tuple if it is not already an in-line tuple. While the normalization of R.call_tir_inplace would handle this case if the expression is generated through C++, this wrapping generates a tuple of var-to-tuple (R.tuple(args)), circumventing the normalization.
  3. The error-checking by R.call_tir_inplace can be triggered by multiple conditions (argument is not a tensor, argument doesn't have known shape, argument's known shape does not match output shape), but the error message attempts to access the argument's known shape, triggering a segfault if it doesn't actually exist.
Lunderberg commented 2 months ago

I've submitted separate PR https://github.com/apache/tvm/pull/17242 which should provide a better error message (instead of a segfault) when this occurs.

For (2), we may be able to improve it by checking isinstance(args.struct_info, TupleStructInfo) rather than isinstance(args. relax.Tuple). This way, a tuple that was defined earlier in the function wouldn't be modified, and would produce an error message at an earlier point. The normalization (which would produce the required in-line Tuple) is suppressed during TVMScript parsing, since TVMScript is frequently used for writing test cases that violate Relax assumptions. Maybe we should tie the normalization to the existing check_well_formed flag, so those tests usually disable the well-formed checks as well.

Lunderberg commented 2 months ago

And #17243 should address (2) by improving the error message.

Cookiee235 commented 2 months ago

@Lunderberg Thanks for your fixing. Segfault is a dangerous behavior, fixing it with an error message is a good strategy. Also thanks for your efforts in improving the well-formed checking for relay IR which is also very meaningful.