Open srogatch opened 4 years ago
Thanks for the bug report.
Are you using the C API? Can you reproduce the error using the Python package?
Also, it would be great if we can reproduce this problem using Linux, since most of us contributors use Linux for daily development.
Are you using the C API? Can you reproduce the error using the Python package?
Yes, I'm using C API without any Python. For the reasons explained above a reproducer is problematic to provide, however, I can provide you with a minidump if you give me the location to upload ~8GB archive.
Also, it would be great if we can reproduce this problem using Linux, since most of us contributors use Linux for daily development.
For now, I only have a minidump that you can open in MSVS2019 or (probably) WinDBG to try an initial investigation.
Please, note that the problem happens rarely - once in an hour or two of XGBoost work on slightly different datasets (labels are changed, features stay the same). So minidump analysis seems a more viable way than trying to come up with a reproducer and running it for hours until it hopefully triggers the problem.
For now, I only have a minidump that you can open in MSVS2019 or (probably) WinDBG to try an initial investigation.
Windows programming is not my area of expertise, so I'm afraid I won't be useful here. I'll keep this issue open for now and see if anyone else can help.
(If the issue were on Linux, I'd be able to use valgrind and memory sanitizer to try to locate a possible memory issue. )
Also one advice: the exact
algorithm has received relatively little attention recently. Most of the active development happens in the hist
and gpu_hist
algorithms. You may have better success with using hist
and gpu_hist
algorithms.
Also one advice: the
exact
algorithm has received relatively little attention recently. Most of the active development happens in thehist
andgpu_hist
algorithms. You may have better success with usinghist
andgpu_hist
algorithms.
Thank you for the advice. I'm using the exact
algorithm because on my dataset the performance of XGBoost is just a slightly better than random guess for classification or the mean constant for regression. So every bit of accuracy matters and I would prefer to wait more for the exact
algorithm to complete than to use hist
/gpu_hist
approximations that bring the performance closer to a random guess/mean.
I'll try to come up with a Python reproducer, though this may take quite some time because I need to first implement it in a sensible way and then to wait for hours to see if the problem gets reproduced or not, and if it doesn't, I need to make the reproducer somewhat closer to the real use case implemented in C++.
I would prefer to wait more for the exact algorithm to complete than to use hist/gpu_hist approximations
I understand. If you do decide to use hist
algorithm, you should set max_bin
to a large number (like 1024 or bigger) to reduce the impact of the approximations. This number controls how many candidate thresholds will be considered for each split.
I am getting the same error in XGBoost: tried versions 1.0 and 1.2. I never got such an error with a less parallelized processor. The processor on which I started to get the error is Ryzen Threadripper 3990X (64 physical, 128 logical cores). The call stack is:
The lines that throw are:
The error is:
If the debugger shows it to me correctly, the value of
p->index
is 1074375450 .The full call stack of the main (i.e. not an OMP) thread that led here is:
The code is compiled with a few versions of MSVS2019, the latest is 16.7.2 used for compilation of XGBoost 1.2.0 . CUDA was enabled during the compilation, but the CPU exact method is used in this experiment.
Sorry that I can't provide the program used for training for several reasons, e.g. it's large and it requires external data to train on. Below is what I can provide instead:
0cd0dad0b5723112f9971659778c9f8b922c16f2
from branchrelease_1.2.0
Below you can find the parameters I used for XGBoost:
Please, let me know how else I can help you with troubleshooting the issue and thank you in advance for looking into it.