VertexC / dl-infer-perf

deep learning inference perf analysis
2 stars 1 forks source link

pytorch->tvm cuda oom #2

Open VertexC opened 3 years ago

VertexC commented 3 years ago

Get error while run multiple tasks together by executor

name:torch2tvm model:mobilenet batch_size:2 params:{'backend': 'cuda'}
Traceback (most recent call last):
  [bt] (8) /scratch/tvm/build/libtvm.so(TVMFuncCall+0x5f) [0x7f1aab82354f]
  [bt] (7) /scratch/tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::relay::backend::RelayBuildModule::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#3}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x3a0) [0x7f1aab64d600]
  [bt] (6) /scratch/tvm/build/libtvm.so(tvm::relay::backend::RelayBuildModule::BuildRelay(tvm::IRModule, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::NDArray, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, tvm::runtime::NDArray> > > const&)+0x1d2e) [0x7f1aab64c4de]
  [bt] (5) /scratch/tvm/build/libtvm.so(tvm::build(tvm::Map<tvm::runtime::String, tvm::IRModule, void, void> const&, tvm::Target const&)+0xdf) [0x7f1aab0b716f]
  [bt] (4) /scratch/tvm/build/libtvm.so(tvm::build(tvm::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target const&)+0x584) [0x7f1aab0b6824]
  [bt] (3) /scratch/tvm/build/libtvm.so(tvm::codegen::Build(tvm::IRModule, tvm::Target)+0x62f) [0x7f1aab1561bf]
  [bt] (2) /scratch/tvm/build/libtvm.so(tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::IRModule, tvm::Target)>::AssignTypedLambda<tvm::runtime::Module (*)(tvm::IRModule, tvm::Target)>(tvm::runtime::Module (*)(tvm::IRModule, tvm::Target), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const+0x298) [0x7f1aab15b458]
  [bt] (1) /scratch/tvm/build/libtvm.so(tvm::codegen::BuildCUDA(tvm::IRModule, tvm::Target)+0x2be) [0x7f1aab797f5e]
  [bt] (0) /scratch/tvm/build/libtvm.so(+0x1a40fdb) [0x7f1aab81ffdb]
  File "/scratch/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun
    rv = local_pyfunc(*pyargs)
  File "/scratch/tvm/python/tvm/autotvm/measure/measure_methods.py", line 676, in tvm_callback_cuda_compile
    ptx = nvcc.compile_cuda(code, target=target, arch=AutotvmGlobalScope.current.cuda_target_arch)
  File "/scratch/tvm/python/tvm/contrib/nvcc.py", line 92, in compile_cuda
    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
  File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1295, in _execute_child
    restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory

but okay with python3 infer_perf/torch2tvm.py mobilenet --batch=2 --size=2

VertexC commented 3 years ago

Memory usage continue to using by monitored with

running torch2tvm round 0 duration: 0.24
name:torch2tvm model:resnet50 batch_size:1 params:{'backend': 'llvm'}: (0.23501062393188477,)
Used Memory: 1189.6640625 MB
running torch2tvm round 0 duration: 0.00
name:torch2tvm model:resnet50 batch_size:2 params:{'backend': 'cuda'}: (9.5367431640625e-07,)
Used Memory: 1376.0703125 MB
running torch2tvm round 0 duration: 0.00
name:torch2tvm model:resnet50 batch_size:2 params:{'backend': 'llvm'}: (9.5367431640625e-07,)
Used Memory: 1462.44140625 MB
running torch2tvm round 0 duration: 0.01
name:torch2tvm model:vgg16 batch_size:1 params:{'backend': 'cuda'}: (0.014052629470825195,)
Used Memory: 1731.8515625 MB

Seem to have memory leak https://discuss.tvm.apache.org/t/memory-leak-how-to-free-memory-manually-after-building-a-operator-for-long-running-building-case/8037 https://github.com/apache/tvm/issues/6590

VertexC commented 3 years ago

https://github.com/apache/tvm/issues/7399