Tencent / TPAT

TensorRT Plugin Autogen Tool
Apache License 2.0
367 stars 42 forks source link

out of memeory #33

Closed frankxyy closed 1 year ago

frankxyy commented 1 year ago
Traceback (most recent call last):
  File "test_onehot_dynamic_direct.py", line 344, in <module>
    main()
  File "test_onehot_dynamic_direct.py", line 236, in main
    trt_plugin_names = onnx2plugin(
  File "/root/tpat/examples/../python/onnx_to_plugin.py", line 190, in onnx2plugin
    onnx_name_mapping_trt_plugin = generate_plugin_library(
  File "/root/tpat/examples/../python/onnx_to_plugin.py", line 85, in generate_plugin_library
    cuda_kernel.run()
  File "/root/tpat/python/cuda_kernel.py", line 83, in run
    self._module = graph_executor.create(
  File "/workspace/TPAT/3rdparty/blazerml-tvm/python/tvm/contrib/graph_executor.py", line 66, in create
    return GraphModule(fcreate(graph_json_str, libmod, *device_type_id))
  File "/workspace/TPAT/3rdparty/blazerml-tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  8: TVMFuncCall
  7: _ZNSt17_Function_handlerIFvN3
  6: tvm::runtime::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const [clone .isra.0]
  5: tvm::runtime::GraphExecutorCreate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::Module const&, std::vector<DLDevice, std::allocator<DLDevice> > const&, tvm::runtime::PackedFunc)
  4: tvm::runtime::GraphExecutor::Init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::Module, std::vector<DLDevice, std::allocator<DLDevice> > const&, tvm::runtime::PackedFunc)
  3: tvm::runtime::GraphExecutor::SetupStorage()
  2: tvm::runtime::NDArray::Empty(tvm::runtime::ShapeTuple, DLDataType, DLDevice, tvm::runtime::Optional<tvm::runtime::String>)
  1: tvm::runtime::DeviceAPI::AllocDataSpace(DLDevice, int, long const*, DLDataType, tvm::runtime::Optional<tvm::runtime::String>)
  0: tvm::runtime::CUDADeviceAPI::AllocDataSpace(DLDevice, unsigned long, unsigned long, DLDataType)
  File "/workspace/TPAT/3rdparty/blazerml-tvm/src/runtime/cuda/cuda_device_api.cc", line 123
TVMError:
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA: out of memory

I run for a onehot plugin with node input [xxx, 561, 561], depth 64. The above error message occurred. The shape of the node input seems not to use so much memory.

frankxyy commented 1 year ago

It seems that for large batch_size(256), this error occurs. However, it seems that the onehot operator should not need so much space...

wenqf11 commented 1 year ago

This issue is duplicated with https://github.com/Tencent/TPAT/issues/34, close it.