Closed pallabkarmakar closed 4 years ago
Could you provide a reproducible example? We run sanitizer at test so I would like to confirm the error is in XGBoost.
I have run valgrind for demo/c-api from xgboost compiled directory and below is the log.
Note: I have changed "cc=g++ -std=c++11" at Makefile and make use "auto" instead of "hist" at line 37 (c-api-demo.c)
-rwx------ 1 xxxxx rqd_ml_appdev 11437 Oct 30 08:41 c-api-demo
-rw------- 1 xxxxx rqd_ml_appdev 3073 Oct 30 08:41 c-api-demo.c
-rw------- 1 xxxxx rqd_ml_appdev 154 Oct 11 06:28 CMakeLists.txt
-rw------- 1 xxxxx rqd_ml_appdev 412 Oct 30 08:36 Makefile
-rw------- 1 xxxxx rqd_ml_appdev 966 Oct 11 06:28 README.md
-bash-4.1$ valgrind --leak-check=yes --track-origins=yes --leak-check=full --show-leak-kinds=all ./c-api-demo
==132552== Memcheck, a memory error detector
==132552== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==132552== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==132552== Command: ./c-api-demo
==132552==
[08:42:23] 6513x127 matrix with 143286 entries loaded from ../data/agaricus.txt.train
[08:42:25] 1611x127 matrix with 35442 entries loaded from ../data/agaricus.txt.test
[08:42:31] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3
[0] train-error:0.014433 test-error:0.016139
[08:42:36] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 0 pruned nodes, max_depth=3
[1] train-error:0.014433 test-error:0.016139
[08:42:41] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3
[2] train-error:0.014433 test-error:0.016139
[08:42:47] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3
[3] train-error:0.008598 test-error:0.009932
[08:42:53] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3
[4] train-error:0.001228 test-error:0.000000
[08:43:00] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3
[5] train-error:0.001228 test-error:0.000000
[08:43:06] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3
[6] train-error:0.001228 test-error:0.000000
[08:43:13] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3
[7] train-error:0.001228 test-error:0.000000
[08:43:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3
[8] train-error:0.001228 test-error:0.000000
[08:43:27] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3
[9] train-error:0.001228 test-error:0.000000
y_pred: 0.0239 0.9544 0.0239 0.0239 0.0490 0.1056 0.9544 0.0288 0.9544 0.0242
y_test: 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 1.0000 0.0000
==132552==
==132552== HEAP SUMMARY:
==132552== in use at exit: 124,968 bytes in 68 blocks
==132552== total heap usage: 3,103 allocs, 3,035 frees, 59,957,373 bytes allocated
==132552==
==132552== 128 bytes in 1 blocks are still reachable in loss record 1 of 6
==132552== at 0x4C28D23: malloc (vg_replace_malloc.c:299)
==132552== by 0x6010578: ??? (in /apps/anaconda/4.3.1/3/lib/libgomp.so.1.0.0)
==132552== by 0x6018FEC: ??? (in /apps/anaconda/4.3.1/3/lib/libgomp.so.1.0.0)
==132552== by 0x600EC11: ??? (in /apps/anaconda/4.3.1/3/lib/libgomp.so.1.0.0)
==132552== by 0x400E9CE: _dl_init (in /lib64/ld-2.12.so)
==132552== by 0x4000B69: ??? (in /lib64/ld-2.12.so)
==132552==
==132552== 192 bytes in 1 blocks are still reachable in loss record 2 of 6
==132552== at 0x4C28D23: malloc (vg_replacemalloc.c:299)
==132552== by 0x6010578: ??? (in /apps/anaconda/4.3.1/3/lib/libgomp.so.1.0.0)
==132552== by 0x601792A: ??? (in /apps/anaconda/4.3.1/3/lib/libgomp.so.1.0.0)
==132552== by 0x4FA24E7: dmlc::Parser
Just ran sanitizer with c-api-demo.c with modified tree method, didn't find any leak. Will try valgrind later, I have very limited network access at the moment.
@pallabkarmakar Just ran valgrind. There are two issues here: Valgrind doesn't understand the static storage used by dmlc::Parameter so it believes the memory is uninitialised, this also happens to our clang-tidy test. It's quite tricky as the initialization happens during dynamic loading of the library (before program is run).
Another problem is openmp seems messing up with valgrind. The possible leak all comes from openmp zoom even there's no allocation in there. If I compile XGBoost without openmp there's no leak warning. So far this is confirmed by sanitizer.
Using XGBoosterFree and XGDMatrixFree the memory leak reduced. hence closing the ticket.
Hi Team,
I am getting huge memory leak while using xgboost c_api like ( XGDMatrixCreateFromMat, XGBoosterPredict, XGBoosterLoadModel). I have used valgrind to detect the leak. Following is the summary.
Xgboost Version: 0.90
valgrind Log:
==179722== LEAK SUMMARY: ==179722== definitely lost: 1,872 bytes in 78 blocks ==179722== indirectly lost: 12,652,023 bytes in 4,474 blocks ==179722== possibly lost: 54,624 bytes in 65 blocks ==179722== still reachable: 86,664 bytes in 5 blocks ==179722== suppressed: 0 bytes in 0 blocks
One example of valgrind message:
==72562== at 0x4C29343: operator new(unsigned long) (vg_replace_malloc.c:334) ==72562== by 0x4F9C209: xgboost::gbm::GBTreeModel::Load(dmlc::Stream) (in /xgboost/lib/libxgboost.so) ==72562== by 0x4F9F23A: xgboost::gbm::GBTree::Load(dmlc::Stream) (in /xgboost/lib/libxgboost.so) ==72562== by 0x4FB80B9: xgboost::LearnerImpl::Load(dmlc::Stream*) (in /xgboost/lib/libxgboost.so) ==72562== by 0x4EE1AE6: XGBoosterLoadModel (in /xgboost/lib/libxgboost.so)
is there any work around for the above ? Could any one please help me ?