Open drei34 opened 2 years ago
This will depend on which xgboost runtime you are using. We have two xgboost runtimes:
See https://github.com/combust/mleap/tree/master/mleap-xgboost-runtime has details on how to swap between them
P.S. I'm guessing your chart is showing the stats per row and not the aggregate for batch size? I.e., the mean time for batch_size=20 is 0.625*20 in aggregate. It would be pretty surprising to me if predict(50_rows)
completed faster than predict(1_row)
Thanks! I ran for 1000 iterations for a fixed batch size, so for example 1000 iterations of batch size 1 took 1.05 1000 ms. For batch size 20, it was 0.625 1000 and for size 50 it was 0.468 * 1000. So yes, I'm showing predict(50_rows) < predict(1_rows) which is what is curious. This is not expected? Do you have a slack channel btw?
To be clear ... I am making a Transformer in java from the mleap bundle and then just taking the prediction time for different data frames I generate of a fixed size. And this is giving me this counterintuitive result ...
I definitely would not expect predict(50_rows) < predict(1_rows)
.
predict(50_rows) / 50 < predict(1_rows)
would obviously make sense.
Only ideas I have are some weirdness in the benchmarking setup like cache warming, startup, etc.. If you're not already, then using jmh for benchmarking is usually helpful for eliminating that kind of noise.
Right I also am a bit weirded out by this but also in production the worst latencies I saw seem to have been by requests which have a 1 feature row; more feature rows seems to do better (a large batch has better latency than 1 row, predict(50_rows) < predict(1_row)). So this is confirming what I see but it does not make sense and I'm trying to get an understanding of it ... Is it possible that the reason this is happening is that in the case when batches are small the number of threads that go up is large and then they "wait" to come down and this has some sort of inefficiency? I did not use jmh yet but I'm also loading another model and for this model when the number of rows grows the latency grows, which makes sense (predict(50_rows) > predict(1_rows)). The only thing I can come up with currently is the threading inside of the bundle has some optimization specific to larger batches and it's detrimental to smaller batches ... Can try jmh and come back or maybe a quick zoom?
Hi, I have the question below from another repo that I think is no longer active, so I pasted it. Basically, I don't quite understand why with bigger data batch sizes MLeap XGBoost bundle seems to be running faster. I assume it is threading but unsure. Please let me know that and if I can turn off such optimizations, I am trying to compare with something that's unoptimized currently.