[Logging] Reducing stack usage by moving `dmlc::LogMessageFatal` to the heap

junrushao commented 4 years ago

When working on tvm::runtime::Array, we found that CHECK causes significant regression in stack utilization. This is not acceptable especially in compilers written in recursive visitor pattern, which crashes on comparably deep neural networks.

As diving deeper into this case, we figured out that dmlc::LogMessageFatal is the cause - it is super sophisticated and throws inside the destructor.

So we would love to patch dmlc-core, move dmlc::LogMessageFatal from stack to heap. A simple trick is to wrap it with std::unique_ptr - it is doable because dmlc-core has been updated to C++ 11.

Besides patching dmlc-core, we can change the implementation of tvm::runtime::Array as well as follows: 0) no change (using CHECK) 1) throw dmlc::Error instead of using CHECK 2) throw std::runtime_error instead of using CHECK 3) remove and disable the CHECK

Below are the benchmarks we did. We use g++ -fstack-usage to dump the stack usage information for the target function.

Settings:

Compiler: g++ 5.4
OS: Ubuntu 18.04
Stack size: ulimit -s 8192
Target function: virtual tvm::relay::Expr tvm::relay::CommonSubexprEliminator::VisitExpr_(const tvm::relay::CallNode*), which originally crashed this PR on a very deep network

Results:

	Patch dmlc-core	No patch dmlc-core
CHECK	560	4656
throw dmlc::Error	736	1264
throw std::runtime_error	480	1008
No checks (no-op)	480	1008

Side notes: It is unclear right now why throwing dmlc::Error takes more stack space than std::runtime_error, although they intent to be identical.

junrushao commented 4 years ago

CC: @hcho3 @leezu @tqchen if you are interested :-)

hcho3 commented 4 years ago

@junrushao1994 Thanks for the ping. It's going to be useful for reducing overhead in error checking. cc @trivialfis

junrushao commented 4 years ago

Just to follow up, now we are able to get rid of std::unique_ptr (it is problematic in this case because it assumes no-except destructor), and in the mean time, we further reduced the stack usage from 560 to 544. I will send a PR shortly.

dmlc / dmlc-core

[Logging] Reducing stack usage by moving `dmlc::LogMessageFatal` to the heap #613