apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.6k stars 3.44k forks source link

[Bug] StringImm Object Can't Be Pass to C++ Side from Python Side #15716

Open Johnson9009 opened 1 year ago

Johnson9009 commented 1 year ago

I found a very strange bug about StringImm, like the simple code below can reproduce the error.

terminate called after throwing an instance of 'tvm::runtime::InternalError'
  what():  [11:30:45] xxx/src/runtime/object.cc:150: InternalError: Check failed: (tindex < type_table_.size() && type_table_[tindex].allocated_slots != 0) is false: Unknown type index 8
Stack trace:
  0: ffi_call

Steps to reproduce

from tvm import ir, tir

a = tir.StringImm("global")
b = tir.Var(a, "int32")
Johnson9009 commented 1 year ago

@Hzfengsy @junrushao This bug is so strange, are you interested in it? Thanks.

junrushao commented 1 year ago

This is a very interesting bug...

junrushao commented 1 year ago

Printing out GetOrAllocRuntimeTypeIndex, and the outputs are:

[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.ADT, static_tindex = 10, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = Map, static_tindex = 5, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.Closure, static_tindex = 9, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.ShapeTuple, static_tindex = 6, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.disco.ShardLoader, static_tindex = 11, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.disco.DRef, static_tindex = 8, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.disco.Session, static_tindex = 11, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.disco.ThreadedSession, static_tindex = 11, parent_tindex = 774, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = metadata.MetadataArrayNode, static_tindex = 11, parent_tindex = 512, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.Module, static_tindex = 1, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.NDArray, static_tindex = 2, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.PackedFunc, static_tindex = 7, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = TimerNode, static_tindex = 11, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = DefaultTimerNode, static_tindex = 11, parent_tindex = 777, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = CPUTimerNode, static_tindex = 11, parent_tindex = 777, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.profiling.DeviceWrapper, static_tindex = 11, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.profiling.MetricCollector, static_tindex = 11, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = relax.vm.AttentionKVCache, static_tindex = 11, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = relax.vm.Closure, static_tindex = 11, parent_tindex = 9, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = vm.Closure, static_tindex = 11, parent_tindex = 9, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = CUDATimerNode, static_tindex = 11, parent_tindex = 777, num_child_slots = 0, child_slots_can_overflow = 1
junrushao commented 1 year ago

Does it help to revert this commit? https://github.com/apache/tvm/commit/6554e2e082cf9d3ecf867fcd81b1b6d483df46a8

Johnson9009 commented 1 year ago

@junrushao Sorry for the late reply, I haven't try it, but I think it won't help, because our internal TVM repo haven't sync this commit yet, and the problem still will happen with a different very strange error message, I remember it say it want a argument of "tir.StringImm" but got a "runtime.Closure". @ysh329 Can you help to track and help with junrushao to investigate this issue? Thanks.

ysh329 commented 1 year ago

@junrushao Sorry for the late reply, I haven't try it, but I think it won't help, because our internal TVM repo haven't sync this commit yet, and the problem still will happen with a different very strange error message, I remember it say it want a argument of "tir.StringImm" but got a "runtime.Closure". @ysh329 Can you help to track and help with junrushao to investigate this issue? Thanks.

Of course. Let me use latest code reproduce first.

python3 test.py
terminate called after throwing an instance of 'tvm::runtime::InternalError'
  what():  [16:02:51] /home/stayua01/code/tvm/src/runtime/object.cc:150: InternalError: Check failed: (tindex < type_table_.size() && type_table_[tindex].allocated_slots != 0) is false: Unknown type index 8
Stack trace:

Aborted

My reproduce result is same.

Besides, I tried v0.13.0 release code as below:

Traceback (most recent call last):
  File "test.py", line 4, in <module>
    b = tir.Var(a, "int32")
  File "/home/xxxxxx/download/apache-tvm-src-v0.13.0/python/tvm/tir/expr.py", line 362, in __init__
    self.__init_handle_by_constructor__(_ffi_api.Var, name, dtype, span)  # type: ignore
  File "/home/xxxxxx/download/apache-tvm-src-v0.13.0/python/tvm/_ffi/_ctypes/object.py", line 145, in __init_handle_by_constructor__
    handle = __init_by_constructor__(fconstructor, args)
  File "/home/xxxxxx/download/apache-tvm-src-v0.13.0/python/tvm/_ffi/_ctypes/packed_func.py", line 261, in __init_handle_by_constructor__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  3: TVMFuncCall
  2: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::tir::Var (tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)>::AssignTypedLambda<tvm::tir::{lambda(tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)#1}>(tvm::tir::{lambda(tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  1: tvm::runtime::TypedPackedFunc<tvm::tir::Var (tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)>::AssignTypedLambda<tvm::tir::{lambda(tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)#1}>(tvm::tir::{lambda(tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const
  0: tvm::runtime::TVMMovableArgValueWithContext_::operator tvm::runtime::String<tvm::runtime::String>() const
  4: TVMFuncCall
  3: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::tir::Var (tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)>::AssignTypedLambda<tvm::tir::{lambda(tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)#1}>(tvm::tir::{lambda(tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  2: tvm::runtime::TypedPackedFunc<tvm::tir::Var (tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)>::AssignTypedLambda<tvm::tir::{lambda(tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)#1}>(tvm::tir::{lambda(tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const
  1: tvm::runtime::TVMMovableArgValueWithContext_::operator tvm::runtime::String<tvm::runtime::String>() const
  0: tvm::runtime::TVMArgValue::operator std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >() const
  File "/home/xxxxxx/download/apache-tvm-src-v0.13.0/include/tvm/runtime/packed_func.h", line 777
TVMError: In function tir.Var(0: runtime.String, 1: TVMArgValue, 2: Span) -> tir.Var: error while converting argument 0: [16:01:59] /home/xxxxxx/download/apache-tvm-src-v0.13.0/include/tvm/runtime/packed_func.h:681: InternalError: Check failed: (IsObjectRef<tvm::runtime::String>()) is false: Could not convert TVM object of type runtime.Closure to a string.

Last log: Could not convert TVM object of type runtime.Closure to a string. v0.13.0 is eariler than commit https://github.com/apache/tvm/commit/6554e2e082cf9d3ecf867fcd81b1b6d483df46a8.

junrushao commented 1 year ago

I'm able to confirm on my end that this bug exists in both main and unity branch. @ysh329 if it doesn't bother you too much, would you mind doing a git bisect to find out which commit causes this issue? Thanks a bunch!!

ysh329 commented 1 year ago

I'm able to confirm on my end that this bug exists in both main and unity branch. @ysh329 if it doesn't bother you too much, would you mind doing a git bisect to find out which commit causes this issue? Thanks a bunch!!

Okay, Let me learn how to use git biseset and find a good commit first. :)

Known commits, its date, status:

  1. latest 09/14/23(e2e1d44c7cb22f869275744b69865dc55b439313), ignored, bad(Unknown type index 8);
  2. v0.13.0 (683dfb0c04d9f2296940e89c60c2277aca095ccd), bad(Could not convert TVM object of type runtime.Closure to a string);
  3. 73740385a96fb6, 01/31/23, bad(Could not convert TVM object of type runtime.Closure to a string);

I skip to commit c0d2734056d4d4bfc67a125b4e61194a809f22d5 with date 09/15/22 and try but failed with bad result(Could not convert TVM object of type runtime.Closure to a string).

I skip to commit 63bb3b9855a392268819ea76413ee6bbc66d6058 with date 03/31/22 ad try but failed with bad result(Could not convert TVM object of type runtime.Closure to a string).

I skip to commit b3ab19ed63bca0481557dab095c08e24e49dda78 with date 03/31/21 ad try but failed with bad result: Check failed: (IsObjectRef()) is false: Could not convert TVM object of type arith.ConstIntBound to a string. More detail:

  5: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at /workspaces/tvm/include/tvm/runtime/packed_func.h:1381
  4: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at /workspaces/tvm/include/tvm/runtime/packed_func.h:1396
  3: tvm::runtime::TVMMovableArgValueWithContext_::operator tvm::runtime::String<tvm::runtime::String>() const
        at /workspaces/tvm/include/tvm/runtime/packed_func.h:711
  2: tvm::runtime::TVMMovableArgValue_::operator tvm::runtime::String<tvm::runtime::String, void>() const
        at /workspaces/tvm/include/tvm/runtime/packed_func.h:1651
  1: tvm::runtime::PackedFuncValueConverter<tvm::runtime::String>::From(tvm::runtime::TVMArgValue const&)
        at /workspaces/tvm/include/tvm/runtime/packed_func.h:1670
  0: tvm::runtime::TVMArgValue::operator std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >() const
        at /workspaces/tvm/include/tvm/runtime/packed_func.h:616
  File "/workspaces/tvm/include/tvm/runtime/packed_func.h", line 713
TVMError: In function tir.Var: error while converting argument 0: [07:59:48] /workspaces/tvm/include/tvm/runtime/packed_func.h:616: 
---------------------------------------------------------------
An internal invariant was violated during the execution of TVM.
Please read TVM's error reporting guidelines.
More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.
---------------------------------------------------------------
  Check failed: (IsObjectRef<tvm::runtime::String>()) is false: Could not convert TVM object of type arith.ConstIntBound to a string

I skip to commit b8474c80e3070af346cadbf2eafd4ab50936b2ef with date 09/22/2020 and try but failed with bad result (ModuleNotFoundError: No module named 'typed_ast'). More detail:

Traceback (most recent call last):
  File "/workspaces/tvm/test.py", line 1, in <module>
    from tvm import ir, tir
  File "/workspaces/tvm/python/tvm/__init__.py", line 61, in <module>
    from . import hybrid
  File "/workspaces/tvm/python/tvm/hybrid/__init__.py", line 19, in <module>
    from .utils import create_module, ashybrid, script
  File "/workspaces/tvm/python/tvm/hybrid/utils.py", line 23, in <module>
    from .parser import from_source
  File "/workspaces/tvm/python/tvm/hybrid/parser.py", line 23, in <module>
    from typed_ast import ast3 as ast
ModuleNotFoundError: No module named 'typed_ast'

I skip to commit 03ff0cd06051262bebedab7592729f2cf3ed87e8 with date 03/31/2020 and try but failed with bad result (TVMError: Check failed: typecode == kTVMStr (8 vs. 11) : expected str but get Object). More detail:

Traceback (most recent call last):

  File "/workspaces/tvm/test.py", line 5, in <module>
    b = tir.Var(a, "int32")

  File "/workspaces/tvm/python/tvm/tir/expr.py", line 304, in __init__
    self.__init_handle_by_constructor__(

  File "/workspaces/tvm/python/tvm/_ffi/_ctypes/object.py", line 92, in __init_handle_by_constructor__
    handle = __init_by_constructor__(fconstructor, args)

  File "/workspaces/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 228, in __init_handle_by_constructor__
    raise get_last_ffi_error()

tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) /workspaces/tvm/build/libtvm.so(std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0x5e) [0x7fe434c49c3e]
  [bt] (7) /workspaces/tvm/build/libtvm.so(+0x16325a6) [0x7fe434e5b5a6]
  [bt] (6) /workspaces/tvm/build/libtvm.so(+0x1631cce) [0x7fe434e5acce]
  [bt] (5) /workspaces/tvm/build/libtvm.so(+0x1632161) [0x7fe434e5b161]
  [bt] (4) /workspaces/tvm/build/libtvm.so(+0x16324cb) [0x7fe434e5b4cb]
  [bt] (3) /workspaces/tvm/build/libtvm.so(+0x1632cd1) [0x7fe434e5bcd1]
  [bt] (2) /workspaces/tvm/build/libtvm.so(+0x1633208) [0x7fe434e5c208]
  [bt] (1) /workspaces/tvm/build/libtvm.so(tvm::runtime::TVMArgValue::operator std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >() const+0x1e6) [0x7fe434ae120e]
  [bt] (0) /workspaces/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4e) [0x7fe434ae09b4]
  File "/workspaces/tvm/include/tvm/runtime/packed_func.h", line 553
TVMError: Check failed: type_code_ == kTVMStr (8 vs. 11) : expected str but get Object

I skip to commit a8c369218e87979020f732b9b2ad373fce4895f2 with date 12/31/2019 and try but failed with bad result (a = tir.StringImm("global"), AttributeError: module 'tvm.tir' has no attribute 'StringImm')

Lunderberg commented 11 months ago

It looks like it's a distinction between a tvm.tir.StringImm and a tvm.runtime.String. The tvm.tir.StringImm is only used in a few rare cases in the IR where a string must be used as a tir.PrimExpr, such as string literals that are used as external function arguments. The tvm.runtime.String is used in the majority of cases, and is the String class accepted by most TVM APIs.

When passing a tvm.runtime.String through the FFI, there is special handling to ensure that a Python str or a Python tvm.runtime.String can be converted to either a C++ tvm::runtime::String or a C++ std::string. (See here (python), here (C++) and here (C++) for where this this implemented.) There is no such handling for tvm.tir.StringImm, and so it is handled through the default ObjectRef FFI interface, and can only be passed to APIs that explicitly expect a tvm.tir.StringImm.

Since the tir.Var constructor accepts a tvm::runtime::String as its first argument, the Python API can call it with a str or a tvm.runtime.String, but not with a tvm.tir.StringImm. If I change your example to the following, then it can produce the tir.Var instance.

import tvm

a = tvm.runtime.String("global")
b = tir.Var(a, "int32")
ysh329 commented 11 months ago

Hi all, with git bisect, I locate the commit below which from error message Could not convert TVM object of type runtime.Closure to a string to error message Unknown type index 8.

commit 6554e2e082cf9d3ecf867fcd81b1b6d483df46a8 (HEAD)
Author: Junru Shao <junrushao@apache.org>
Date:   Sun Aug 27 20:43:33 2023 -0700
Result: First (index 8)

    [RPC] Enhance RPC Protocol to support TVM Object (#15631)

    This PR introduces object support in TVM RPC protocol by introducing three
    new interfaces in `rpc_reference.h`:
    - `uint64_t GetObjectBytes(Object* obj)`, which is a required
      implementation that returns the length of the object during serialization;
    - `void WriteObject(Object* obj)` used to serialize an object to a
      writable channel;
    - `void ReadObject(int* type_code, TVMValue* value)`, which deserializes
      }
    }

    uint64_t GetObjectBytes(Object* obj) {
      uint64_t result = 0;
      if (obj is ShapeTuple) {
        result += sizeof(uint32_t); # for `type_index`
        result += sizeof(int32_t);  # for `ndim`
        result += sizeof(int64_t) * obj->ndim; # for content of the shape
      } else {
        throw Unsupported;
      }
      return result;
    }
To deserialize an object, similar to serialization, the recommended
approach paradigm is to read `type_index` and disptch based on it.

Caveat on deserialization: RPC Reference itself does not own or allocate
any memory to store objects, meaning extra logic is usually required in
`ReadObject` to keep their liveness.


However, it's so strange due to this commit PR (https://github.com/apache/tvm/pull/15631) seems done changes about RPC only.
junrushao commented 11 months ago

Yeah it doesn't do anything concrete to the object system...