Open Johnson9009 opened 1 year ago
@Hzfengsy @junrushao This bug is so strange, are you interested in it? Thanks.
This is a very interesting bug...
Printing out GetOrAllocRuntimeTypeIndex
, and the outputs are:
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.ADT, static_tindex = 10, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = Map, static_tindex = 5, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.Closure, static_tindex = 9, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.ShapeTuple, static_tindex = 6, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.disco.ShardLoader, static_tindex = 11, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.disco.DRef, static_tindex = 8, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.disco.Session, static_tindex = 11, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.disco.ThreadedSession, static_tindex = 11, parent_tindex = 774, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = metadata.MetadataArrayNode, static_tindex = 11, parent_tindex = 512, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.Module, static_tindex = 1, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.NDArray, static_tindex = 2, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.PackedFunc, static_tindex = 7, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = TimerNode, static_tindex = 11, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = DefaultTimerNode, static_tindex = 11, parent_tindex = 777, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = CPUTimerNode, static_tindex = 11, parent_tindex = 777, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.profiling.DeviceWrapper, static_tindex = 11, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = runtime.profiling.MetricCollector, static_tindex = 11, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = relax.vm.AttentionKVCache, static_tindex = 11, parent_tindex = 0, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = relax.vm.Closure, static_tindex = 11, parent_tindex = 9, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = vm.Closure, static_tindex = 11, parent_tindex = 9, num_child_slots = 0, child_slots_can_overflow = 1
[05:51:33] /opt/scratch/junrushao/tvm-dev/src/runtime/object.cc:213: key = CUDATimerNode, static_tindex = 11, parent_tindex = 777, num_child_slots = 0, child_slots_can_overflow = 1
Does it help to revert this commit? https://github.com/apache/tvm/commit/6554e2e082cf9d3ecf867fcd81b1b6d483df46a8
@junrushao Sorry for the late reply, I haven't try it, but I think it won't help, because our internal TVM repo haven't sync this commit yet, and the problem still will happen with a different very strange error message, I remember it say it want a argument of "tir.StringImm" but got a "runtime.Closure". @ysh329 Can you help to track and help with junrushao to investigate this issue? Thanks.
@junrushao Sorry for the late reply, I haven't try it, but I think it won't help, because our internal TVM repo haven't sync this commit yet, and the problem still will happen with a different very strange error message, I remember it say it want a argument of "tir.StringImm" but got a "runtime.Closure". @ysh329 Can you help to track and help with junrushao to investigate this issue? Thanks.
Of course. Let me use latest code reproduce first.
python3 test.py
terminate called after throwing an instance of 'tvm::runtime::InternalError'
what(): [16:02:51] /home/stayua01/code/tvm/src/runtime/object.cc:150: InternalError: Check failed: (tindex < type_table_.size() && type_table_[tindex].allocated_slots != 0) is false: Unknown type index 8
Stack trace:
Aborted
My reproduce result is same.
Besides, I tried v0.13.0 release code as below:
Traceback (most recent call last):
File "test.py", line 4, in <module>
b = tir.Var(a, "int32")
File "/home/xxxxxx/download/apache-tvm-src-v0.13.0/python/tvm/tir/expr.py", line 362, in __init__
self.__init_handle_by_constructor__(_ffi_api.Var, name, dtype, span) # type: ignore
File "/home/xxxxxx/download/apache-tvm-src-v0.13.0/python/tvm/_ffi/_ctypes/object.py", line 145, in __init_handle_by_constructor__
handle = __init_by_constructor__(fconstructor, args)
File "/home/xxxxxx/download/apache-tvm-src-v0.13.0/python/tvm/_ffi/_ctypes/packed_func.py", line 261, in __init_handle_by_constructor__
raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
3: TVMFuncCall
2: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::tir::Var (tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)>::AssignTypedLambda<tvm::tir::{lambda(tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)#1}>(tvm::tir::{lambda(tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
1: tvm::runtime::TypedPackedFunc<tvm::tir::Var (tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)>::AssignTypedLambda<tvm::tir::{lambda(tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)#1}>(tvm::tir::{lambda(tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const
0: tvm::runtime::TVMMovableArgValueWithContext_::operator tvm::runtime::String<tvm::runtime::String>() const
4: TVMFuncCall
3: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::tir::Var (tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)>::AssignTypedLambda<tvm::tir::{lambda(tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)#1}>(tvm::tir::{lambda(tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
2: tvm::runtime::TypedPackedFunc<tvm::tir::Var (tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)>::AssignTypedLambda<tvm::tir::{lambda(tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)#1}>(tvm::tir::{lambda(tvm::runtime::String, tvm::runtime::TVMArgValue, tvm::Span)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const
1: tvm::runtime::TVMMovableArgValueWithContext_::operator tvm::runtime::String<tvm::runtime::String>() const
0: tvm::runtime::TVMArgValue::operator std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >() const
File "/home/xxxxxx/download/apache-tvm-src-v0.13.0/include/tvm/runtime/packed_func.h", line 777
TVMError: In function tir.Var(0: runtime.String, 1: TVMArgValue, 2: Span) -> tir.Var: error while converting argument 0: [16:01:59] /home/xxxxxx/download/apache-tvm-src-v0.13.0/include/tvm/runtime/packed_func.h:681: InternalError: Check failed: (IsObjectRef<tvm::runtime::String>()) is false: Could not convert TVM object of type runtime.Closure to a string.
Last log: Could not convert TVM object of type runtime.Closure to a string.
v0.13.0
is eariler than commit https://github.com/apache/tvm/commit/6554e2e082cf9d3ecf867fcd81b1b6d483df46a8.
I'm able to confirm on my end that this bug exists in both main and unity branch. @ysh329 if it doesn't bother you too much, would you mind doing a git bisect
to find out which commit causes this issue? Thanks a bunch!!
I'm able to confirm on my end that this bug exists in both main and unity branch. @ysh329 if it doesn't bother you too much, would you mind doing a
git bisect
to find out which commit causes this issue? Thanks a bunch!!
Okay, Let me learn how to use git biseset and find a good commit first. :)
Known commits, its date, status:
I skip to commit c0d2734056d4d4bfc67a125b4e61194a809f22d5 with date 09/15/22 and try but failed with bad result(Could not convert TVM object of type runtime.Closure to a string).
I skip to commit 63bb3b9855a392268819ea76413ee6bbc66d6058 with date 03/31/22 ad try but failed with bad result(Could not convert TVM object of type runtime.Closure to a string).
I skip to commit b3ab19ed63bca0481557dab095c08e24e49dda78 with date 03/31/21 ad try but failed with bad result: Check failed: (IsObjectRef
5: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
at /workspaces/tvm/include/tvm/runtime/packed_func.h:1381
4: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
at /workspaces/tvm/include/tvm/runtime/packed_func.h:1396
3: tvm::runtime::TVMMovableArgValueWithContext_::operator tvm::runtime::String<tvm::runtime::String>() const
at /workspaces/tvm/include/tvm/runtime/packed_func.h:711
2: tvm::runtime::TVMMovableArgValue_::operator tvm::runtime::String<tvm::runtime::String, void>() const
at /workspaces/tvm/include/tvm/runtime/packed_func.h:1651
1: tvm::runtime::PackedFuncValueConverter<tvm::runtime::String>::From(tvm::runtime::TVMArgValue const&)
at /workspaces/tvm/include/tvm/runtime/packed_func.h:1670
0: tvm::runtime::TVMArgValue::operator std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >() const
at /workspaces/tvm/include/tvm/runtime/packed_func.h:616
File "/workspaces/tvm/include/tvm/runtime/packed_func.h", line 713
TVMError: In function tir.Var: error while converting argument 0: [07:59:48] /workspaces/tvm/include/tvm/runtime/packed_func.h:616:
---------------------------------------------------------------
An internal invariant was violated during the execution of TVM.
Please read TVM's error reporting guidelines.
More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.
---------------------------------------------------------------
Check failed: (IsObjectRef<tvm::runtime::String>()) is false: Could not convert TVM object of type arith.ConstIntBound to a string
I skip to commit b8474c80e3070af346cadbf2eafd4ab50936b2ef with date 09/22/2020 and try but failed with bad result (ModuleNotFoundError: No module named 'typed_ast'). More detail:
Traceback (most recent call last):
File "/workspaces/tvm/test.py", line 1, in <module>
from tvm import ir, tir
File "/workspaces/tvm/python/tvm/__init__.py", line 61, in <module>
from . import hybrid
File "/workspaces/tvm/python/tvm/hybrid/__init__.py", line 19, in <module>
from .utils import create_module, ashybrid, script
File "/workspaces/tvm/python/tvm/hybrid/utils.py", line 23, in <module>
from .parser import from_source
File "/workspaces/tvm/python/tvm/hybrid/parser.py", line 23, in <module>
from typed_ast import ast3 as ast
ModuleNotFoundError: No module named 'typed_ast'
I skip to commit 03ff0cd06051262bebedab7592729f2cf3ed87e8 with date 03/31/2020 and try but failed with bad result (TVMError: Check failed: typecode == kTVMStr (8 vs. 11) : expected str but get Object). More detail:
Traceback (most recent call last):
File "/workspaces/tvm/test.py", line 5, in <module>
b = tir.Var(a, "int32")
File "/workspaces/tvm/python/tvm/tir/expr.py", line 304, in __init__
self.__init_handle_by_constructor__(
File "/workspaces/tvm/python/tvm/_ffi/_ctypes/object.py", line 92, in __init_handle_by_constructor__
handle = __init_by_constructor__(fconstructor, args)
File "/workspaces/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 228, in __init_handle_by_constructor__
raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (8) /workspaces/tvm/build/libtvm.so(std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0x5e) [0x7fe434c49c3e]
[bt] (7) /workspaces/tvm/build/libtvm.so(+0x16325a6) [0x7fe434e5b5a6]
[bt] (6) /workspaces/tvm/build/libtvm.so(+0x1631cce) [0x7fe434e5acce]
[bt] (5) /workspaces/tvm/build/libtvm.so(+0x1632161) [0x7fe434e5b161]
[bt] (4) /workspaces/tvm/build/libtvm.so(+0x16324cb) [0x7fe434e5b4cb]
[bt] (3) /workspaces/tvm/build/libtvm.so(+0x1632cd1) [0x7fe434e5bcd1]
[bt] (2) /workspaces/tvm/build/libtvm.so(+0x1633208) [0x7fe434e5c208]
[bt] (1) /workspaces/tvm/build/libtvm.so(tvm::runtime::TVMArgValue::operator std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >() const+0x1e6) [0x7fe434ae120e]
[bt] (0) /workspaces/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4e) [0x7fe434ae09b4]
File "/workspaces/tvm/include/tvm/runtime/packed_func.h", line 553
TVMError: Check failed: type_code_ == kTVMStr (8 vs. 11) : expected str but get Object
I skip to commit a8c369218e87979020f732b9b2ad373fce4895f2 with date 12/31/2019 and try but failed with bad result (a = tir.StringImm("global"), AttributeError: module 'tvm.tir' has no attribute 'StringImm')
It looks like it's a distinction between a tvm.tir.StringImm
and a tvm.runtime.String
. The tvm.tir.StringImm
is only used in a few rare cases in the IR where a string must be used as a tir.PrimExpr
, such as string literals that are used as external function arguments. The tvm.runtime.String
is used in the majority of cases, and is the String
class accepted by most TVM APIs.
When passing a tvm.runtime.String
through the FFI, there is special handling to ensure that a Python str
or a Python tvm.runtime.String
can be converted to either a C++ tvm::runtime::String
or a C++ std::string
. (See here (python), here (C++) and here (C++) for where this this implemented.) There is no such handling for tvm.tir.StringImm
, and so it is handled through the default ObjectRef
FFI interface, and can only be passed to APIs that explicitly expect a tvm.tir.StringImm
.
Since the tir.Var
constructor accepts a tvm::runtime::String
as its first argument, the Python API can call it with a str
or a tvm.runtime.String
, but not with a tvm.tir.StringImm
. If I change your example to the following, then it can produce the tir.Var
instance.
import tvm
a = tvm.runtime.String("global")
b = tir.Var(a, "int32")
Hi all, with git bisect
, I locate the commit below which from error message Could not convert TVM object of type runtime.Closure to a string
to error message Unknown type index 8
.
commit 6554e2e082cf9d3ecf867fcd81b1b6d483df46a8 (HEAD)
Author: Junru Shao <junrushao@apache.org>
Date: Sun Aug 27 20:43:33 2023 -0700
Result: First (index 8)
[RPC] Enhance RPC Protocol to support TVM Object (#15631)
This PR introduces object support in TVM RPC protocol by introducing three
new interfaces in `rpc_reference.h`:
- `uint64_t GetObjectBytes(Object* obj)`, which is a required
implementation that returns the length of the object during serialization;
- `void WriteObject(Object* obj)` used to serialize an object to a
writable channel;
- `void ReadObject(int* type_code, TVMValue* value)`, which deserializes
}
}
uint64_t GetObjectBytes(Object* obj) {
uint64_t result = 0;
if (obj is ShapeTuple) {
result += sizeof(uint32_t); # for `type_index`
result += sizeof(int32_t); # for `ndim`
result += sizeof(int64_t) * obj->ndim; # for content of the shape
} else {
throw Unsupported;
}
return result;
}
To deserialize an object, similar to serialization, the recommended
approach paradigm is to read `type_index` and disptch based on it.
Caveat on deserialization: RPC Reference itself does not own or allocate
any memory to store objects, meaning extra logic is usually required in
`ReadObject` to keep their liveness.
However, it's so strange due to this commit PR (https://github.com/apache/tvm/pull/15631) seems done changes about RPC only.
Yeah it doesn't do anything concrete to the object system...
I found a very strange bug about StringImm, like the simple code below can reproduce the error.
Steps to reproduce