dmlc / dmlc-core

A common bricks library for building scalable and portable distributed machine learning.
Apache License 2.0
864 stars 520 forks source link

[Logging] Reducing stack usage by moving `dmlc::LogMessageFatal` to the heap #613

Closed junrushao closed 4 years ago

junrushao commented 4 years ago

When working on tvm::runtime::Array, we found that CHECK causes significant regression in stack utilization. This is not acceptable especially in compilers written in recursive visitor pattern, which crashes on comparably deep neural networks.

As diving deeper into this case, we figured out that dmlc::LogMessageFatal is the cause - it is super sophisticated and throws inside the destructor.

So we would love to patch dmlc-core, move dmlc::LogMessageFatal from stack to heap. A simple trick is to wrap it with std::unique_ptr - it is doable because dmlc-core has been updated to C++ 11.

Besides patching dmlc-core, we can change the implementation of tvm::runtime::Array as well as follows: 0) no change (using CHECK) 1) throw dmlc::Error instead of using CHECK 2) throw std::runtime_error instead of using CHECK 3) remove and disable the CHECK

Below are the benchmarks we did. We use g++ -fstack-usage to dump the stack usage information for the target function.

Settings:

Results:

Patch dmlc-core No patch dmlc-core
CHECK 560 4656
throw dmlc::Error 736 1264
throw std::runtime_error 480 1008
No checks (no-op) 480 1008

Side notes: It is unclear right now why throwing dmlc::Error takes more stack space than std::runtime_error, although they intent to be identical.

See also: https://github.com/apache/incubator-tvm/pull/5585#issuecomment-631147088

junrushao commented 4 years ago

CC: @hcho3 @leezu @tqchen if you are interested :-)

hcho3 commented 4 years ago

@junrushao1994 Thanks for the ping. It's going to be useful for reducing overhead in error checking. cc @trivialfis

junrushao commented 4 years ago

Just to follow up, now we are able to get rid of std::unique_ptr (it is problematic in this case because it assumes no-except destructor), and in the mean time, we further reduced the stack usage from 560 to 544. I will send a PR shortly.