cornell-zhang / heterocl

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
https://cornell-zhang.github.io/heterocl/
Apache License 2.0
324 stars 92 forks source link

TVM Argument Binding Failed for 512-bit UInt Datatype #302

Open zzzDavid opened 4 years ago

zzzDavid commented 4 years ago

Problem Description

FlexCNN uses 512-bit global input output bus but implementing it in HeteroCL causes TVM error in src/pass/arg_binder.cc.

hcl.UInt(512) seems to generate uint0 buffer which causes TVM arg binding problem.

Error Message:

$ python samples/flexcnn/flexcnn.py
Using TensorFlow backend.
[16:41:10] Mark stage update_global_cin on FPGA scope...
[16:41:10] Mark stage cin_load on FPGA scope...
[16:41:10] Mark stage top_kernel on FPGA scope...
[16:41:10] Mark stage update0 on FPGA scope...
[16:41:10] Mark stage cin_load_prev on FPGA scope...
[16:41:10] Mark stage top_kernel on FPGA scope...
[16:41:10] Mark stage update4 on FPGA scope...
[16:41:10] Mark stage weight_load on FPGA scope...
[16:41:10] Mark stage top_kernel on FPGA scope...
[16:41:10] Mark stage update5 on FPGA scope...
[16:41:10] Mark stage weight_load on FPGA scope...
[16:41:10] Mark stage top_kernel on FPGA scope...
[16:41:10] Mark stage update21 on FPGA scope...
[16:41:10] Mark stage cout_write on FPGA scope...
[16:41:10] Mark stage top_kernel on FPGA scope...
[16:41:10] Mark stage layer_config on FPGA scope...
[16:41:10] Mark stage top_kernel on FPGA scope...
[16:41:10] src/schedule/schedule_reorder.cc:558: top_kernel should be set as an endpoint... rolling back
Traceback (most recent call last):

  File "samples/flexcnn/flexcnn.py", line 401, in <module>
    test_flexcnn()

  File "samples/flexcnn/flexcnn.py", line 394, in test_flexcnn
    code = str(hcl.build(s, p, name="main"))

  File "/Users/zhangniansong/.local/lib/python3.7/site-packages/heterocl-0.3-py3.7.egg/heterocl/api.py", line 335, in build
    return _build(schedule.sch, new_inputs, target=target, name=name, stmt=stmt, schedule_name=schedule.name)

  File "/Users/zhangniansong/.local/lib/python3.7/site-packages/heterocl-0.3-py3.7.egg/heterocl/tvm/build_module.py", line 572, in build
    return build_fpga_kernel(sch, args, target, name=name, schedule_name=schedule_name)

  File "/Users/zhangniansong/.local/lib/python3.7/site-packages/heterocl-0.3-py3.7.egg/heterocl/tvm/build_module.py", line 428, in build_fpga_kernel
    flist = lower(sch, args, kernel_only=True, name=name)

  File "/Users/zhangniansong/.local/lib/python3.7/site-packages/heterocl-0.3-py3.7.egg/heterocl/tvm/build_module.py", line 350, in lower
    stmt = ir_pass.StorageFlatten(stmt, binds, 64)

  File "/Users/zhangniansong/.local/lib/python3.7/site-packages/heterocl-0.3-py3.7.egg/heterocl/tvm/_ffi/function.py", line 280, in my_api_func
    return flocal(*args)

  File "/Users/zhangniansong/.local/lib/python3.7/site-packages/heterocl-0.3-py3.7.egg/heterocl/tvm/_ffi/_ctypes/function.py", line 183, in __call__
    ctypes.byref(ret_val), ctypes.byref(ret_tcode)))

  File "/Users/zhangniansong/.local/lib/python3.7/site-packages/heterocl-0.3-py3.7.egg/heterocl/tvm/_ffi/base.py", line 66, in check_call
    raise TVMError(py_str(_LIB.TVMGetLastError()))

heterocl.tvm._ffi.base.TVMError: [16:41:10] src/pass/arg_binder.cc:84: Check failed: arg->dtype == value->dtype (uint0 vs. uint32) Argument global_cin Buffer bind data type mismatch

Stack trace returned 10 entries:
[bt] (0) 0   libhcl.dylib                        0x00000001136facae dmlc::StackTrace() + 254
[bt] (1) 1   libhcl.dylib                        0x00000001136faa5f dmlc::LogMessageFatal::~LogMessageFatal() + 47
[bt] (2) 2   libhcl.dylib                        0x00000001138860af TVM::ir::ArgBinder::BindBuffer(TVM::Buffer const&, TVM::Buffer const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool) + 575
[bt] (3) 3   libhcl.dylib                        0x00000001139aa6e7 TVM::ir::StorageFlattener::HandleBufferBindScope(Halide::Internal::AttrStmt const*) + 4423
[bt] (4) 4   libhcl.dylib                        0x000000011399dad5 TVM::ir::StorageFlattener::Mutate_(Halide::Internal::AttrStmt const*, Halide::Internal::Stmt const&) + 1941
[bt] (5) 5   libhcl.dylib                        0x00000001138f8e05 std::__1::__function::__func<TVM::ir::$_1, std::__1::allocator<TVM::ir::$_1>, Halide::Internal::Stmt (Halide::Internal::AttrStmt const*, Halide::Internal::Stmt const&, TVM::ir::IRMutator*)>::operator()(Halide::Internal::AttrStmt const*&&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*&&) + 21
[bt] (6) 6   libhcl.dylib                        0x00000001138f83d1 std::__1::__function::__func<TVM::IRFunctor<Halide::Internal::Stmt (TVM::NodeRef const&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*)>& TVM::IRFunctor<Halide::Internal::Stmt (TVM::NodeRef const&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*)>::set_dispatch<Halide::Internal::AttrStmt>(std::__1::function<Halide::Internal::Stmt (Halide::Internal::AttrStmt const*, Halide::Internal::Stmt const&, TVM::ir::IRMutator*)>)::'lambda'(TVM::NodeRef const&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*), std::__1::allocator<TVM::IRFunctor<Halide::Internal::Stmt (TVM::NodeRef const&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*)>& TVM::IRFunctor<Halide::Internal::Stmt (TVM::NodeRef const&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*)>::set_dispatch<Halide::Internal::AttrStmt>(std::__1::function<Halide::Internal::Stmt (Halide::Internal::AttrStmt const*, Halide::Internal::Stmt const&, TVM::ir::IRMutator*)>)::'lambda'(TVM::NodeRef const&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*)>, Halide::Internal::Stmt (TVM::NodeRef const&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*)>::operator()(TVM::NodeRef const&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*&&) + 49
[bt] (7) 7   libhcl.dylib                        0x000000011374daec TVM::IRFunctor<Halide::Internal::Stmt (TVM::NodeRef const&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*)>::operator()(TVM::NodeRef const&, Halide::Internal::Stmt const&, TVM::ir::IRMutator*) const + 348
[bt] (8) 8   libhcl.dylib                        0x00000001138675fb TVM::ir::IRMutator::Mutate(Halide::Internal::Stmt) + 59
[bt] (9) 9   libhcl.dylib                        0x00000001139aa812 TVM::ir::StorageFlattener::HandleBufferBindScope(Halide::Internal::AttrStmt const*) + 4722

Repeat the Error

HeteroCL version: Hecmay/heterocl:fix

Code: zzzDavid:heterocl/samples/flexcnn/flexcnn.py (Needs samples/flexcnn/kernel/*.cpp)

$ python samples/flexcnn/flexcnn.py

Possible Cause

Argument binding supports only up to 128-bit

seanlatias commented 4 years ago

We only support up to 255 bits in HeteroCL right now.

zhangzhiru commented 4 years ago

@seanlatias what prevents us from allowing wider integer? At the very least, we need to prompt error message instead of letting the tool crash.

seanlatias commented 4 years ago

I think I mentioned this before, we use 8-bit to store the total bitwidth and thus we can only support up to 255 bits.

seanlatias commented 4 years ago

If you think this one has a higher priority then I'll fix this first.

zhangzhiru commented 4 years ago

Yes, we should fix this issue since we already have a relatively simple solution in mind. I thought this one is different since Niansong is talking about the global input/output.

seanlatias commented 4 years ago

No, according to his code, he is just generating the code without running the cpu simulation. In this case, we are not limited to numpy and thus we should be able to generate code for larger bitwidth for the global input/output.

zhangzhiru commented 4 years ago

Let's be sure to prompt error message. We also need to create test case to check the message.

seanlatias commented 4 years ago

@zzzDavid should be fixed by #303. Let me know otherwise.