elastic / ml-cpp

Machine learning C++ code
Other
7 stars 62 forks source link

[ML] Probable SIGSEGV due to out-of-bounds access in updateRecycledModels #76

Closed tveasey closed 3 years ago

tveasey commented 6 years ago

We've had a crash reported against 5.6.8 (possibly due to incorrect state upgrade from 5.5 -> 5.6.8) with stack trace:

Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./autodetect --jobid=webscan_rare_responses_v2 --licenseValidation=213'.
Program terminated with signal 11, Segmentation fault.
...
(gdb) bt
#0  0x00007f4b52d48eaf in ml::model::CModel::updateRecycledModels() () from /mnt/Downloads/libs/libMlModel.so
#1  0x00007f4b52d73466 in ml::model::CPopulationModel::updateRecycledModels() () from /mnt/Downloads/libs/libMlModel.so
#2  0x00007f4b52d6d6e9 in ml::model::CPopulationModel::createUpdateNewModels(long, ml::model::CResourceMonitor&) () from /mnt/Downloads/libs/libMlModel.so
#3  0x00007f4b52bb98ce in ml::model::CEventRatePopulationModel::sample(long, long, ml::model::CResourceMonitor&) () from /mnt/Downloads/libs/libMlModel.so
#4  0x00007f4b52aee6c9 in ml::model::CAnomalyDetector::sample(long, long, ml::model::CResourceMonitor&) () from /mnt/Downloads/libs/libMlModel.so
#5  0x00007f4b52af39b2 in void ml::model::CAnomalyDetector::buildResultsHelper<boost::_bi::bind_t<void, boost::_mfi::mf3<void, ml::model::CAnomalyDetector, long, long, ml::model::CResourceMonitor&>, boost::_bi::list4<boost::_bi::value<ml::model::CAnomalyDetector*>, boost::arg<1>, boost::arg<2>, boost::reference_wrapper<ml::model::CResourceMonitor> > >, boost::_bi::bind_t<void, boost::_mfi::mf1<void, ml::model::CAnomalyDetector, long>, boost::_bi::list2<boost::_bi::value<ml::model::CAnomalyDetector*>, boost::arg<1> > > >(long, long, boost::_bi::bind_t<void, boost::_mfi::mf3<void, ml::model::CAnomalyDetector, long, long, ml::model::CResourceMonitor&>, boost::_bi::list4<boost::_bi::value<ml::model::CAnomalyDetector*>, boost::arg<1>, boost::arg<2>, boost::reference_wrapper<ml::model::CResourceMonitor> > >, boost::_bi::bind_t<void, boost::_mfi::mf1<void, ml::model::CAnomalyDetector, long>, boost::_bi::list2<boost::_bi::value<ml::model::CAnomalyDetector*>, boost::arg<1> > >, ml::model::CHierarchicalResults&) ()
   from /mnt/Downloads/libs/libMlModel.so
#6  0x00007f4b52af230c in ml::model::CAnomalyDetector::buildResults(long, long, ml::model::CHierarchicalResults&) () from /mnt/libs/libMlModel.so
#7  0x00007f4b52748ea0 in ml::api::CAnomalyDetector::outputResults(long) () from /mnt/Downloads/libs/libMlApi.so
#8  0x00007f4b5274922e in ml::api::CAnomalyDetector::outputBucketResultsUntil(long) () from /mnt/Downloads/libs/libMlApi.so
#9  0x00007f4b5274a729 in ml::api::CAnomalyDetector::handleRecord(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, boost::unordered::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&) ()
   from /mnt/Downloads/libs/libMlApi.so
#10 0x00007f4b527c7a71 in ml::api::CLengthEncodedInputParser::readStream(boost::function2<bool, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, boost::unordered::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&> const&) () from /mnt/Downloads/libs/libMlApi.so
#11 0x00007f4b527766d3 in ml::api::CCmdSkeleton::ioLoop() () from /mnt/Downloads/libs/libMlApi.so
#12 0x00007f4b54c8849d in main ()

The likely cause is access to m_PersonBucketCounts being out-of-range. There ought to be an invariant preventing this. As a short term fix, we should make the code defensive and log an error. (We should check other uses of recycledPersonIds which may also be problematic.) As a separate task we need to understand how we could have got into this situation.

\cc @hoigau.

tveasey commented 3 years ago

This was fixed long ago (#79) and the underlying mechanism was debugged.