[ML] Multiple errors Failed to compute BIC gain

sophiec20 commented 6 years ago

Found in 6.3.0 "build" : { "hash" : "a47564f", "date" : "2018-04-09T21:45:51.347364Z" },

The following job configuration gives multiple Failed to compute BIC gain errors.

index=cloudwatch-*
varp(NetworkIn) partition=instance
bucket=1h
no influencers

Logged 28 times (excl repeats) for the job, where:

processed_record_count=1,793,481
bucket_count=349

[2018-04-11T23:13:18,367][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 15.6249, m = 4.32498e+12, v = 1.69601e+24, wl = 0.936, ml = 4.05914e+12, vl = 6.39695e+23, wr = 0.0640004, mr = 8.21289e+12, vr = 1
[2018-04-11T23:13:29,287][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 34.7124, m = 4.70506e+12, v = 1.25675e+24, wl = 0.971192, ml = 4.60736e+12, vl = 9.53939e+23, wr = 0.0288081, mr = 7.99874e+12, vr = 1
[2018-04-11T23:13:41,721][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 64.044, m = 7.45135e+11, v = 1.19181e+24, wl = 0.984386, ml = 6.2762e+11, vl = 3.12548e+23, wr = 0.0156143, mr = 8.15375e+12, vr = 1
[2018-04-11T23:13:51,858][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 79.8143, m = 6.83061e+11, v = 9.42267e+23, wl = 0.987861, ml = 5.91262e+11, vl = 2.50963e+23, wr = 0.0121386, mr = 8.15375e+12, vr = 1 | repeated [6]
[2018-04-11T23:13:51,858][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 82.4017, m = 6.88987e+11, v = 9.12067e+23, wl = 0.988303, ml = 6.00641e+11, vl = 2.47508e+23, wr = 0.0116967, mr = 8.15375e+12, vr = 1
[2018-04-11T23:14:03,204][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 102.714, m = 7.29093e+11, v = 8.78016e+23, wl = 0.990989, ml = 6.61581e+11, vl = 3.7529e+23, wr = 0.00901112, mr = 8.15375e+12, vr = 1 | repeated [9]
[2018-04-11T23:14:03,204][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 106.051, m = 6.5185e+11, v = 9.04512e+23, wl = 0.990617, ml = 5.77328e+11, vl = 3.15652e+23, wr = 0.00938298, mr = 8.5196e+12, vr = 1
[2018-04-11T23:14:14,216][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 115.117, m = 5.94361e+11, v = 6.95126e+23, wl = 0.991523, ml = 5.26604e+11, vl = 1.54791e+23, wr = 0.0084771, mr = 8.5196e+12, vr = 1 | repeated [4]
[2018-04-11T23:14:14,216][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 117.564, m = 5.91316e+11, v = 6.79206e+23, wl = 0.991739, ml = 5.25277e+11, vl = 1.52444e+23, wr = 0.00826078, mr = 8.5196e+12, vr = 1
[2018-04-11T23:14:26,108][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 129.678, m = 5.96904e+11, v = 6.7069e+23, wl = 0.992686, ml = 5.38529e+11, vl = 2.06157e+23, wr = 0.00731417, mr = 8.5196e+12, vr = 1 | repeated [5]
[2018-04-11T23:14:26,108][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 132.077, m = 5.89993e+11, v = 6.57443e+23, wl = 0.992852, ml = 5.32903e+11, vl = 2.02782e+23, wr = 0.00714803, mr = 8.5196e+12, vr = 1
[2018-04-11T23:14:36,628][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 141.597, m = 5.78892e+11, v = 6.07828e+23, wl = 0.993454, ml = 5.26566e+11, vl = 1.9065e+23, wr = 0.00654637, mr = 8.5196e+12, vr = 1 | repeated [5]
[2018-04-11T23:14:36,628][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 87.1394, m = -7.90563e+09, v = 9.85519e+21, wl = 0.990096, ml = -1.34236e+10, vl = 6.84487e+21, wr = 0.00990386, mr = 5.43733e+11, vr = 1
[2018-04-11T23:14:47,001][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 153.343, m = 5.9883e+11, v = 5.84851e+23, wl = 0.994088, ml = 5.51727e+11, vl = 2.10577e+23, wr = 0.00591165, mr = 8.5196e+12, vr = 1 | repeated [8]
[2018-04-11T23:14:47,001][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 85.2254, m = -7.90562e+09, v = 9.8577e+21, wl = 0.990096, ml = -1.34236e+10, vl = 6.84663e+21, wr = 0.00990386, mr = 5.43733e+11, vr = 1
[2018-04-11T23:14:58,401][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 164.933, m = 6.30959e+11, v = 6.99466e+23, wl = 0.994621, ml = 5.883e+11, vl = 3.62865e+23, wr = 0.0053786, mr = 8.5196e+12, vr = 1 | repeated [5]
[2018-04-11T23:14:58,401][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 167.235, m = 6.27647e+11, v = 6.88039e+23, wl = 0.994718, ml = 5.85739e+11, vl = 3.57227e+23, wr = 0.00528211, mr = 8.5196e+12, vr = 1
[2018-04-11T23:15:09,306][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 176.388, m = 6.11862e+11, v = 6.46645e+23, wl = 0.995075, ml = 5.72724e+11, vl = 3.37065e+23, wr = 0.00492499, mr = 8.5196e+12, vr = 1 | repeated [4]
[2018-04-11T23:15:09,306][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 178.664, m = 6.07918e+11, v = 6.36951e+23, wl = 0.995158, ml = 5.69421e+11, vl = 3.32288e+23, wr = 0.00484229, mr = 8.5196e+12, vr = 1
[2018-04-11T23:15:19,774][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 192.231, m = 6.13207e+11, v = 6.24342e+23, wl = 0.995607, ml = 5.78318e+11, vl = 3.48601e+23, wr = 0.00439336, mr = 8.5196e+12, vr = 1 | repeated [8]
[2018-04-11T23:15:19,775][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 156.846, m = 4.33368e+12, v = 8.52157e+22, wl = 0.993717, ml = 4.31303e+12, vl = 1.74852e+22, wr = 0.00628338, mr = 7.59881e+12, vr = 1
[2018-04-11T23:15:30,574][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 163.136, m = 4.33173e+12, v = 7.83373e+22, wl = 0.994067, ml = 4.31223e+12, vl = 1.43253e+22, wr = 0.0059331, mr = 7.59881e+12, vr = 1 | repeated [8]
[2018-04-11T23:15:30,574][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 205.662, m = 6.33707e+11, v = 6.11444e+23, wl = 0.995988, ml = 6.01938e+11, vl = 3.61151e+23, wr = 0.0040125, mr = 8.5196e+12, vr = 1
[2018-04-11T23:15:41,681][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 203.467, m = 4.2552e+12, v = 1.95803e+23, wl = 0.995085, ml = 4.23378e+12, vl = 1.02889e+23, wr = 0.00491481, mr = 8.59368e+12, vr = 1 | repeated [12]
[2018-04-11T23:15:41,681][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 205.739, m = 4.25132e+12, v = 1.98176e+23, wl = 0.995157, ml = 4.23019e+12, vl = 1.06481e+23, wr = 0.00484314, mr = 8.59368e+12, vr = 1
[2018-04-11T23:16:04,751][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 219.084, m = 4.25202e+12, v = 1.85405e+23, wl = 0.995551, ml = 4.23262e+12, vl = 1.01238e+23, wr = 0.00444879, mr = 8.59368e+12, vr = 1 | repeated [6]
[2018-04-11T23:16:04,751][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 33.891, m = 4.05763e+12, v = 9.28898e+23, wl = 0.970494, ml = 3.92819e+12, vl = 3.72835e+23, wr = 0.0295063, mr = 8.31519e+12, vr = 1
[2018-04-11T23:16:11,172][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [cw-varp-networkin-p-instance-1h] [autodetect/9127] [CXMeansOnline1d.cc@367] Failed to compute BIC gain: Error in function boost::math::lgamma<double>(double): numeric overflow, n = 42.0431, m = 4.38949e+12, v = 3.55918e+23, wl = 0.97687, ml = 4.32675e+12, vl = 1.90185e+23, wr = 0.0231301, mr = 7.03951e+12, vr = 1 | repeated [6]

hendrikmuhs commented 6 years ago

Tracked this down:

boost::maths::lgamma throws an exception due to an internal numeric overflow. In detail I found the following problems:

it fails to compute lgamma for values which are not near the limits, scipy and std::lgamma successfully compute them (they seem to have a better implementation, but I haven't looked for the details), note that bigger values are properly working
using the ignore_error policy, avoids the exception and should set a proper value, but it sets the value to -inf while in this case it should be set to +inf at best, -inf is problematic as it breaks the code further down when choosing the min from a set of variables

As C++11 provides std::lgamma I think this is the best and easiest fix (There seem to be more boost math functions that can be replaced by C++11 functions, I will create a follow-up issue for them).

Notes:

I checked our recent update of boost, I do not think it's caused by this, the changes are unrelated
I did some micro-benchmarking, std::lgamma is a very tiny difference faster.

droberts195 commented 6 years ago

std::lgamma successfully compute them

Are you sure this exists in Visual Studio 2013? I did a quick search and could only find it in Visual Studio 2015, but maybe I'm wrong - Microsoft has made it extra hard to search the docs for old Visual Studio versions.

If Visual Studio 2013 doesn't have std::lgamma then the fix would be for 7.0 only.

tveasey commented 6 years ago

Regarding handling policies. I think I generally prefer the throw on error. We can usually take better action than trying to handle infinities in later code: you tend to quickly run into nan explosion.

I'd be completely happy to migrate to std::lgamma if it is available on all platforms and if it is the same on all platforms; we'd need to test this. Also, I think we can probably also fix this particular case using Stirling's approximation (which given the overflow is likely to be very accurate). This is something we could perhaps wrap up as a safe implementation ourselves.

hendrikmuhs commented 6 years ago

I haven't checked platform support yet, it might very well be that the incomplete C++11 support for VS fooled me once again. If it's there I would be very surprised if it behaves different.

Anyway, this was a quick heads up on the issue, I am still working on it. What seems clear to me: The current try catch covers to many lines. The 1st step - if std::lgamma is not available - is to wrap lgamma into a separate method to improve error handling (for 7.0 this implementation would of course use std::lgamma). At least for this case I am almost certain that we do not need any custom implementation as we do not need the lgamma value.

elastic / ml-cpp

[ML] Multiple errors Failed to compute BIC gain #51