Open smorovic opened 3 months ago
cms-bot internal usage
A new Issue was created by @smorovic.
@makortel, @Dr15Jones, @smuzaffar, @antoniovilela, @sextonkennedy, @rappoccio can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign reconstruction, ml
New categories assigned: reconstruction,ml
@jfernan2,@mandrenguyen,@valsdav,@wpmccormack you have been requested to review this Pull request/Issue and eventually sign? Thanks
I fetched xgboost v1.7.5 (and for the subpackage dmlc-core
master branch) from github and compiled with default options (cmake . ; make
).
On x86_64 RHEL8 (gcc8) as well as lxplus9-arm (gcc11) it gives the same result for the test, which is consistent with the first check in the unit test.
#include <stdio.h>
#include <string>
#include <sstream>
#include <iostream>
extern "C" {
#include "xgboost/c_api.h"
}
int main() {
int best_ntree_limit_ = 158;
BoosterHandle booster;
XGBoosterCreate(NULL, 0, &booster);
XGBoosterLoadModel(booster, "/afs/cern.ch/user/s/smorovic/public/Photon_NTL_158_Endcap_v1.bin");
std::stringstream config;
config << "{\"training\": false, \"type\": 0, \"iteration_begin\": 0, \"iteration_end\": " << best_ntree_limit_
<< ", \"strict_shape\": false}";
std::string config_ = config.str();
float var[9];
var[0] = 134.303;
var[1] = 0.945981;
var[2] = 0.0264346;
var[3] = 0.012448;
var[4] = 0.0208734;
var[5] = 113.405;
var[6] = 1.7446;
var[7] = 0.00437808;
var[8] = 0.303464;
DMatrixHandle dmat;
XGDMatrixCreateFromMat(var, 1, 9, -999.9f, &dmat);
uint64_t const* out_shape;
uint64_t out_dim;
const float* out_result = NULL;
XGBoosterPredictFromDMatrix(booster, dmat, config_.c_str(), &out_shape, &out_dim, &out_result);
float ret = out_result[0];
XGDMatrixFree(dmat);
float exp = 0.98634;
printf(" ===TEST=== VAL: %f\n", ret);
printf(" ===EXPECTED=== VAL: %f\n", exp);
}
===TEST=== VAL: 0.986344
===EXPECTED=== VAL: 0.986340
If applying this cmsdist patch: https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_14_1_X/master/xgboost-arm-and-ppc.patch There is also no difference.
Finally,compiling with gcc12 (14_0_2 cmsenv) also give the same result.
Making above into unit test and running it on ARM actually passes:
Fail 2s ... RecoEgamma/PhotonIdentification/RecoEgammaPhotonIdentificationTest
Pass 1s ... RecoEgamma/PhotonIdentification/RecoEgammaPhotonIdentificationTest2
so I'm starting to be suspicious that there is something wrong with the unit test itself (regarding portability).
Found it.
if (abs(etaSC) < 1.5)
--> (std::abs(etaSC) < 1.5)
fixes the unit test on ARM:
Pass 1s ... RecoEgamma/PhotonIdentification/RecoEgammaPhotonIdentificationTest
Pass 1s ... RecoEgamma/PhotonIdentification/RecoEgammaPhotonIdentificationTest2
I noticed earlier that on x86_64 abs
is a floating point version, but apparently not on other architectures?
I will push this fix to the unit test (reusing the existing PR and backport).
I noticed earlier that on x86_64
abs
is a floating point version, but apparently not on other architectures?
The C abs()
function is actually for int
, whereas fabs()
is for double
, i.e. the behavior on ARM was the correct one.
The only (or "easiest") way I could imagine the compiler's behavior on x86-64 would be that somehow it picked the C++ std::abs()
instead of C abs()
. But even that would sound strange.
It sounds really strange!
maybe some architecture specific header redefines it, for example #define abs fabs
?
The only way I can make it happening is to have
using namespace std;
or
using std::abs;
somehow conditional on compiling on x86_64
grep abs /data/cmssw/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/stdlib.h
using std::abs;
using std::labs;
if one wrongly does
include<stdlib.h>
it happens (also on ARM btw)
with
#include<cstdlib>
it's ok
( abs integer, std::abs templated)
ditto for #include<math.h>
vs #include<cmath>
play with https://godbolt.org/z/7fs559aWx
1) find math.h or stdlib.h 2) see if included only in x86_64 (clue the various intrinsics)
grep malloc /data/cmssw/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/lib/gcc/x86_64-redhat-linux-gnu/12.3.1/include/xmmintrin.h / Get _mm_malloc () and _mm_free (). /
[innocent@gputest-genoa-01 (gpu-c2e35-08-01) CMSSW_14_0_0]$ cat /tmp/test_PhotonMvaXgb.i | less [innocent@gputest-genoa-01 (gpu-c2e35-08-01) CMSSW_14_0_0]$ grep stdlib /data/cmssw/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/lib/gcc/x86_64-redhat-linux-gnu/12.3.1/include/mm_malloc.h
with explicit include on xmmintrin.h
with explicit include on xmmintrin.h
Yeah, xmmintrin.h
has
#include <mm_malloc.h>
which has
#include <stdlib.h>
(ah, this information was already in https://github.com/cms-sw/cmssw/issues/44542#issuecomment-2020613291)
let's see what they say https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114484
I'm a bit lost in this thread discussion.
Isn't explicit use of std::abs
in our coding rule? Shouldn't the related unit test change from abs
to std::abs
?
Or is there an evidence that std::abs
can get overwritten in some circumstances by a C abs
?
On Mar 26, 2024, at 4:14 PM, Slava Krutelyov @.***> wrote:
I'm a bit lost in this thread discussion. Isn't explicit use of std::abs in our coding rule? YES. Shouldn't the related unit test change from abs to std::abs? YES. Or is there an evidence that std::abs can get overwritten in some circumstances by a C abs? NO. The opposite: C abs is overwritten by C++ std::abs in some circumstances.
I'm a bit lost in this thread discussion. Isn't explicit use of
std::abs
in our coding rule? Shouldn't the related unit test change fromabs
tostd::abs
? Or is there an evidence thatstd::abs
can get overwritten in some circumstances by a Cabs
?
Unit test is going to be fixed to use std::abs, this is already submitted to two PRs (mentioned). In fact std::abs was already correctly used in the related non-unit test code..
Isn't explicit use of std::abs in our coding rule?
well, I would have also thought so, but there is still discussion: https://github.com/cms-sw/cms-sw.github.io/pull/99#discussion_r1526394978
Why didn't you just stick with the very nice GBRForest you have in CMSSW for the MVA? Then you don't need the XGBoost C API dependency, and the GBRForest has even better performance! And it's also more platform independent. Furthermore, these ML tools evolve quickly, so maintaining the dependency can be work.
Translating from XGBoost to GBRForest is easy. I do this in my library, where I renamed the GBRForest from CMS as "FastForest", but the code is almost the same.
Actually I'm about to bring the GBRForest into ROOT itself, rebranded as "RBDT" this time :laughing:
So if at some point CMS uses a newer ROOT version (6.32 I guess) and wants to avoid the XGBoost dependency, it will be very easy. Meaning if the issue is not urgent, it can also be waited out.
Why didn't you just stick with the very nice GBRForest you have in CMSSW for the MVA? Then you don't need the XGBoost C API dependency, and the GBRForest has even better performance! And it's also more platform independent. Furthermore, these ML tools evolve quickly, so maintaining the dependency can be work.
Translating from XGBoost to GBRForest is easy. I do this in my library, where I renamed the GBRForest from CMS as "FastForest", but the code is almost the same.
Actually I'm about to bring the GBRForest into ROOT itself, rebranded as "RBDT" this time 😆
So if at some point CMS uses a newer ROOT version (6.32 I guess) and wants to avoid the XGBoost dependency, it will be very easy. Meaning if the issue is not urgent, it can also be waited out.
Hi @guitargeek, we looked at it, but we didn't go with it initially due to difficulties in translating models. We tried with XGBoost2TMVA and didn't get results we expected (it changed classifiers from [0,1] to [-1,1] range used by TMVA). And then we noticed that XGBoost is integrated as a tool, probably because of Python users...
We are aware that performance should be better with GBRForest. C API is optimized to run on multiple rows, and is relatively heavy on allocations when preparing to run inference..Besides it is also much slower on the full menu than running only selected paths (30 times!), I suspect either caching or heap allocations causing that. Still, for now, them impact on HLT menus is low, (around 0.3% range) which was accepted by TSG.
Therefore, initially we are using C API directly. I agree that we should try to migrate to GRBForest in subsequent releases.
Thank you for suggesting the tool, I will have a look at it (in a week when I'm back from vacation). Possible caveat is that "NTree limit" parameter of the model needs to be passed explicitely to the XGBoost C API (or even python API when loaded from the "bin" files that we're using) or inference always returns different result. Maybe it will be fine with txt files, but we need to try.
Thanks a lot for your answer, even from your vacations! Indeed the performance hit is big and you have to do memory allocations. But if you studies the performance impact and it turned out to be minimal, that's good.
I forgot about XGBoost2TMVA
actually, indeed that would have been the easiest solution because the GBRForest can directly read it, if I remember correctly. I very well remember this problem with different outputs! I forgot the details, but applying an (inverse) logistic transformation and/or a simple re-scaling did the trick for me. Just plot the GBRForest and XGBoost output against each other in a scatter plot, and the functional relation will become apparent.
@smorovic any further progress on this? Or has the issue been resolved with your PR? Thanks
Hello, the problem with excessive creation of OpenMP threads was resolved by PR, so there is no urgent need to replace XGBoost.
Concerning migration away from XGBoost library, about two weeks ago I was looking at how to convert current "bin" files to TMVA models. So far failed to get something useful. It doesn't help that I didn't find code from my older attempt at this with XGBoost2TMVA. Once I manage it, I will start numerically comparing inference with GRBForest to XGBoost and, if needed, see how to limit iteration to get accurate results wrt. XGBoost.
As discussed in PR https://github.com/cms-sw/cmssw/pull/44473 we noticed discrepancy in XGBoost inference result with the new unit test
RecoEgamma/PhotonIdentification/test/test_PhotonMvaXgb.cc
. Unit test passes on x86_64, but fails in identical fashion and with identical discrepancies on both PPC64 LE and ARM 64, happening in 4 out of 10 tests:PR was submitted to disable the check on non-x86_64 for now: https://github.com/cms-sw/cmssw/pull/44531