Closed junrushao closed 4 years ago
CC: @hcho3 @leezu @tqchen if you are interested :-)
@junrushao1994 Thanks for the ping. It's going to be useful for reducing overhead in error checking. cc @trivialfis
Just to follow up, now we are able to get rid of std::unique_ptr
(it is problematic in this case because it assumes no-except destructor), and in the mean time, we further reduced the stack usage from 560 to 544. I will send a PR shortly.
When working on
tvm::runtime::Array
, we found thatCHECK
causes significant regression in stack utilization. This is not acceptable especially in compilers written in recursive visitor pattern, which crashes on comparably deep neural networks.As diving deeper into this case, we figured out that
dmlc::LogMessageFatal
is the cause - it is super sophisticated and throws inside the destructor.So we would love to patch dmlc-core, move
dmlc::LogMessageFatal
from stack to heap. A simple trick is to wrap it withstd::unique_ptr
- it is doable because dmlc-core has been updated to C++ 11.Besides patching dmlc-core, we can change the implementation of
tvm::runtime::Array
as well as follows: 0) no change (using CHECK) 1) throw dmlc::Error instead of using CHECK 2) throw std::runtime_error instead of using CHECK 3) remove and disable the CHECKBelow are the benchmarks we did. We use
g++ -fstack-usage
to dump the stack usage information for the target function.Settings:
virtual tvm::relay::Expr tvm::relay::CommonSubexprEliminator::VisitExpr_(const tvm::relay::CallNode*)
, which originally crashed this PR on a very deep networkResults:
Side notes: It is unclear right now why throwing
dmlc::Error
takes more stack space thanstd::runtime_error
, although they intent to be identical.See also: https://github.com/apache/incubator-tvm/pull/5585#issuecomment-631147088