dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.18k stars 8.71k forks source link

Memory Leak issue with Xgboost c_api #4997

Closed pallabkarmakar closed 4 years ago

pallabkarmakar commented 4 years ago

Hi Team,

I am getting huge memory leak while using xgboost c_api like ( XGDMatrixCreateFromMat, XGBoosterPredict, XGBoosterLoadModel). I have used valgrind to detect the leak. Following is the summary.

Xgboost Version: 0.90

valgrind Log:

==179722== LEAK SUMMARY: ==179722== definitely lost: 1,872 bytes in 78 blocks ==179722== indirectly lost: 12,652,023 bytes in 4,474 blocks ==179722== possibly lost: 54,624 bytes in 65 blocks ==179722== still reachable: 86,664 bytes in 5 blocks ==179722== suppressed: 0 bytes in 0 blocks

One example of valgrind message:

==72562== at 0x4C29343: operator new(unsigned long) (vg_replace_malloc.c:334) ==72562== by 0x4F9C209: xgboost::gbm::GBTreeModel::Load(dmlc::Stream) (in /xgboost/lib/libxgboost.so) ==72562== by 0x4F9F23A: xgboost::gbm::GBTree::Load(dmlc::Stream) (in /xgboost/lib/libxgboost.so) ==72562== by 0x4FB80B9: xgboost::LearnerImpl::Load(dmlc::Stream*) (in /xgboost/lib/libxgboost.so) ==72562== by 0x4EE1AE6: XGBoosterLoadModel (in /xgboost/lib/libxgboost.so)

is there any work around for the above ? Could any one please help me ?

trivialfis commented 4 years ago

Could you provide a reproducible example? We run sanitizer at test so I would like to confirm the error is in XGBoost.

pallabkarmakar commented 4 years ago

I have run valgrind for demo/c-api from xgboost compiled directory and below is the log.

Note: I have changed "cc=g++ -std=c++11" at Makefile and make use "auto" instead of "hist" at line 37 (c-api-demo.c)

Valgrind log:

-rwx------ 1 xxxxx rqd_ml_appdev 11437 Oct 30 08:41 c-api-demo -rw------- 1 xxxxx rqd_ml_appdev 3073 Oct 30 08:41 c-api-demo.c -rw------- 1 xxxxx rqd_ml_appdev 154 Oct 11 06:28 CMakeLists.txt -rw------- 1 xxxxx rqd_ml_appdev 412 Oct 30 08:36 Makefile -rw------- 1 xxxxx rqd_ml_appdev 966 Oct 11 06:28 README.md -bash-4.1$ valgrind --leak-check=yes --track-origins=yes --leak-check=full --show-leak-kinds=all ./c-api-demo ==132552== Memcheck, a memory error detector ==132552== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==132552== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==132552== Command: ./c-api-demo ==132552== [08:42:23] 6513x127 matrix with 143286 entries loaded from ../data/agaricus.txt.train [08:42:25] 1611x127 matrix with 35442 entries loaded from ../data/agaricus.txt.test [08:42:31] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3 [0] train-error:0.014433 test-error:0.016139 [08:42:36] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 0 pruned nodes, max_depth=3 [1] train-error:0.014433 test-error:0.016139 [08:42:41] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3 [2] train-error:0.014433 test-error:0.016139 [08:42:47] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3 [3] train-error:0.008598 test-error:0.009932 [08:42:53] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3 [4] train-error:0.001228 test-error:0.000000 [08:43:00] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3 [5] train-error:0.001228 test-error:0.000000 [08:43:06] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3 [6] train-error:0.001228 test-error:0.000000 [08:43:13] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3 [7] train-error:0.001228 test-error:0.000000 [08:43:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3 [8] train-error:0.001228 test-error:0.000000 [08:43:27] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3 [9] train-error:0.001228 test-error:0.000000 y_pred: 0.0239 0.9544 0.0239 0.0239 0.0490 0.1056 0.9544 0.0288 0.9544 0.0242 y_test: 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 1.0000 0.0000 ==132552== ==132552== HEAP SUMMARY: ==132552== in use at exit: 124,968 bytes in 68 blocks ==132552== total heap usage: 3,103 allocs, 3,035 frees, 59,957,373 bytes allocated ==132552== ==132552== 128 bytes in 1 blocks are still reachable in loss record 1 of 6 ==132552== at 0x4C28D23: malloc (vg_replace_malloc.c:299) ==132552== by 0x6010578: ??? (in /apps/anaconda/4.3.1/3/lib/libgomp.so.1.0.0) ==132552== by 0x6018FEC: ??? (in /apps/anaconda/4.3.1/3/lib/libgomp.so.1.0.0) ==132552== by 0x600EC11: ??? (in /apps/anaconda/4.3.1/3/lib/libgomp.so.1.0.0) ==132552== by 0x400E9CE: _dl_init (in /lib64/ld-2.12.so) ==132552== by 0x4000B69: ??? (in /lib64/ld-2.12.so) ==132552== ==132552== 192 bytes in 1 blocks are still reachable in loss record 2 of 6 ==132552== at 0x4C28D23: malloc (vg_replacemalloc.c:299) ==132552== by 0x6010578: ??? (in /apps/anaconda/4.3.1/3/lib/libgomp.so.1.0.0) ==132552== by 0x601792A: ??? (in /apps/anaconda/4.3.1/3/lib/libgomp.so.1.0.0) ==132552== by 0x4FA24E7: dmlc::Parser dmlc::data::CreateLibSVMParser(std::string const&, std::map<std::string, std::string, std::less, std::allocator<std::pair<std::string const, std::string> > > const&, unsigned int, unsigned int) (in /apps/anaconda/4.3.1/3/lib/libxgboost.so) ==132552== by 0x4F9E0F6: dmlc::Parser dmlc::data::CreateParser(char const, unsigned int, unsigned int, char const) (in /apps/anaconda/4.3.1/3/lib/libxgboost.so) ==132552== by 0x4EB92DB: xgboost::DMatrix::Load(std::string const&, bool, bool, std::string const&) (in /apps/anaconda/4.3.1/3/lib/libxgboost.so) ==132552== by 0x4E9C07F: XGDMatrixCreateFromFile (in /apps/anaconda/4.3.1/3/lib/libxgboost.so) ==132552== by 0x400BF3: main (in /data/rqd/home/r622244/xgboost/demo/c-api/c-api-demo) ==132552== ==132552== 520 bytes in 1 blocks are still reachable in loss record 3 of 6 ==132552== at 0x4C28C6D: malloc (vg_replace_malloc.c:298) ==132552== by 0x4C2AC39: realloc (vg_replacemalloc.c:785) ==132552== by 0x60105C8: ??? (in /apps/anaconda/4.3.1/3/lib/libgomp.so.1.0.0) ==132552== by 0x6017E71: ??? (in /apps/anaconda/4.3.1/3/lib/libgomp.so.1.0.0) ==132552== by 0x4FA24E7: dmlc::Parser dmlc::data::CreateLibSVMParser(std::string const&, std::map<std::string, std::string, std::less, std::allocator<std::pair<std::string const, std::string> > > const&, unsigned int, unsigned int) (in /apps/anaconda/4.3.1/3/lib/libxgboost.so) ==132552== by 0x4F9E0F6: dmlc::Parser dmlc::data::CreateParser(char const, unsigned int, unsigned int, char const) (in /apps/anaconda/4.3.1/3/lib/libxgboost.so) ==132552== by 0x4EB92DB: xgboost::DMatrix::Load(std::string const&, bool, bool, std::string const&) (in /apps/anaconda/4.3.1/3/lib/libxgboost.so) ==132552== by 0x4E9C07F: XGDMatrixCreateFromFile (in /apps/anaconda/4.3.1/3/lib/libxgboost.so) ==132552== by 0x400BF3: main (in /data/rqd/home/r622244/xgboost/demo/c-api/c-api-demo) ==132552== ==132552== 13,120 bytes in 1 blocks are still reachable in loss record 4 of 6 ==132552== at 0x4C28D23: malloc (vg_replace_malloc.c:299) ==132552== by 0x6010578: ??? (in /apps/anaconda/4.3.1/3/lib/libgomp.so.1.0.0) ==132552== by 0x6016F09: ??? (in /apps/anaconda/4.3.1/3/lib/libgomp.so.1.0.0) ==132552== by 0x601372B: GOMP_parallel_start (in /apps/anaconda/4.3.1/3/lib/libgomp.so.1.0.0) ==132552== by 0x4F3AAD8: xgboost::obj::RegLossObj::PredTransform(std::vector<float, std::allocator >) (in /apps/anaconda/4.3.1/3/lib/libxgboost.so) ==132552== by 0x4EA45AB: XGBoosterPredict (in /apps/anaconda/4.3.1/3/lib/libxgboost.so) ==132552== by 0x400DA9: main (in /data/rqd/home/r622244/xgboost/demo/c-api/c-api-demo) ==132552== ==132552== 38,304 bytes in 63 blocks are possibly lost in loss record 5 of 6 ==132552== at 0x4C2AAB5: calloc (vg_replace_malloc.c:711) ==132552== by 0x4011D02: _dl_allocate_tls (in /lib64/ld-2.12.so) ==132552== by 0x62322CC: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.12.so) ==132552== by 0x601740F: ??? (in /apps/anaconda/4.3.1/3/lib/libgomp.so.1.0.0) ==132552== by 0x4FA24E7: dmlc::Parser dmlc::data::CreateLibSVMParser(std::string const&, std::map<std::string, std::string, std::less, std::allocator<std::pair<std::string const, std::string> > > const&, unsigned int, unsigned int) (in /apps/anaconda/4.3.1/3/lib/libxgboost.so) ==132552== by 0x4F9E0F6: dmlc::Parser dmlc::data::CreateParser_(char const, unsigned int, unsigned int, char const*) (in /apps/anaconda/4.3.1/3/lib/libxgboost.so) ==132552== by 0x4EB92DB: xgboost::DMatrix::Load(std::string const&, bool, bool, std::string const&) (in /apps/anaconda/4.3.1/3/lib/libxgboost.so) ==132552== by 0x4E9C07F: XGDMatrixCreateFromFile (in /apps/anaconda/4.3.1/3/lib/libxgboost.so) ==132552== by 0x400BF3: main (in /data/rqd/home/r622244/xgboost/demo/c-api/c-api-demo) ==132552== ==132552== 72,704 bytes in 1 blocks are still reachable in loss record 6 of 6 ==132552== at 0x4C28D23: malloc (vg_replace_malloc.c:299) ==132552== by 0x529ED2F: ??? (in /apps/anaconda/4.3.1/3/lib/libstdc++.so.6.0.21) ==132552== by 0x400E9CE: _dl_init (in /lib64/ld-2.12.so) ==132552== by 0x4000B69: ??? (in /lib64/ld-2.12.so) ==132552== ==132552== LEAK SUMMARY: ==132552== definitely lost: 0 bytes in 0 blocks ==132552== indirectly lost: 0 bytes in 0 blocks ==132552== possibly lost: 38,304 bytes in 63 blocks ==132552== still reachable: 86,664 bytes in 5 blocks ==132552== suppressed: 0 bytes in 0 blocks ==132552== ==132552== For counts of detected and suppressed errors, rerun with: -v ==132552== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4) -bash-4.1$

trivialfis commented 4 years ago

Just ran sanitizer with c-api-demo.c with modified tree method, didn't find any leak. Will try valgrind later, I have very limited network access at the moment.

trivialfis commented 4 years ago

@pallabkarmakar Just ran valgrind. There are two issues here: Valgrind doesn't understand the static storage used by dmlc::Parameter so it believes the memory is uninitialised, this also happens to our clang-tidy test. It's quite tricky as the initialization happens during dynamic loading of the library (before program is run).

Another problem is openmp seems messing up with valgrind. The possible leak all comes from openmp zoom even there's no allocation in there. If I compile XGBoost without openmp there's no leak warning. So far this is confirmed by sanitizer.

pallabkarmakar commented 4 years ago

Using XGBoosterFree and XGDMatrixFree the memory leak reduced. hence closing the ticket.