elastic / ml-cpp

Machine learning C++ code
Other
149 stars 62 forks source link

[ML] Fix stack-use-after-scope error #2673

Closed edsavage closed 2 months ago

edsavage commented 3 months ago

When running the "autodetect" binary, compiled with the address sanitizer enabled, the following error was emitted

==7993==ERROR: AddressSanitizer: stack-use-after-scope on address 0x00016bbf0d37 at pc 0x000106e71b70 bp 0x00016bbf0c40 sp 0x00016bbf0c38
READ of size 1 at 0x00016bbf0d37 thread T0
==7993==WARNING: Failed to use and restart external symbolizer!
    #0 0x106e71b6c in std::__1::__function::__func<ml::model::CHierarchicalResultsNormalizer::isMemberOfPopulation(ml::model::hierarchical_results_detail::SNode const&, std::__1::function<bool (ml::model::hierarchical_results_detail::SNode const&)>)::$_0, std::__1::allocator<ml::model::CHierarchicalResultsNormalizer::isMemberOfPopulation(ml::model::hierarchical_results_detail::SNode const&, std::__1::function<bool (ml::model::hierarchical_results_detail::SNode const&)>)::$_0>, bool (ml::model::hierarchical_results_detail::SNode const&)>::operator()(ml::model::hierarchical_results_detail::SNode const&)+0x3c0 (/Users/eds/src/elasticsearch/ml-cpp/build/distribution/platform/darwin-aarch64/controller.app/Contents/lib/libMlModel.dylib:arm64+0x555b6c)

This PR provides a fix by maintaining references to the strings used by a lambda in the same scope.

edsavage commented 3 months ago

Full stack trace

(base) eds@Eds-MacBook-Pro ml-cpp % build/distribution/platform/darwin-aarch64/controller.app/Contents/MacOS/autodetect --config nginx_test_config.json --input nginx_test_input.csv --output test_output.json --validElasticLicenseKeyConfirmed=true --persist test_persist.json --lengthEncodedInput --maxAnomalyRecords=500
2024-05-27 01:22:50,951462 UTC [7993] DEBUG /Users/eds/src/elasticsearch/ml-cpp/bin/autodetect/Main.cc@156 autodetect (64 bit): Version based on 8.15.0-SNAPSHOT (Build DEVELOPMENT BUILD by eds) Copyright (c) 2024 Elasticsearch BV
2024-05-27 01:22:50,957833 UTC [7993] DEBUG /Users/eds/src/elasticsearch/ml-cpp/lib/seccomp/CSystemCallFilter_MacOSX.cc@107 macOS sandbox initialized
2024-05-27 01:22:50,966230 UTC [7993] INFO  /Users/eds/src/elasticsearch/ml-cpp/lib/model/CResourceMonitor.cc@83 Setting model memory limit to 1024 MB
2024-05-27 01:22:50,966715 UTC [7993] DEBUG /Users/eds/src/elasticsearch/ml-cpp/lib/api/CLengthEncodedInputParser.cc@57 Length encoded input parser input is not connected to stdin
2024-05-27 01:22:50,967074 UTC [7993] DEBUG /Users/eds/src/elasticsearch/ml-cpp/lib/api/CCmdSkeleton.cc@38 No restoration source specified - will not attempt to restore state
2024-05-27 01:22:50,967769 UTC [7993] TRACE /Users/eds/src/elasticsearch/ml-cpp/lib/model/CAnomalyDetector.cc@119 CAnomalyDetector(): count by count for '', first time = 1485907200, bucketLength = 3600, m_LastBucketEndTime = 1485907200
2024-05-27 01:22:50,969552 UTC [7993] TRACE /Users/eds/src/elasticsearch/ml-cpp/lib/model/CAnomalyDetector.cc@119 CAnomalyDetector(): high_count over nginx.access.remote_ip [nginx.access.remote_ip] for '', first time = 1485907200, bucketLength = 3600, m_LastBucketEndTime = 1485907200
=================================================================
==7993==ERROR: AddressSanitizer: stack-use-after-scope on address 0x00016bbf0d37 at pc 0x000106e71b70 bp 0x00016bbf0c40 sp 0x00016bbf0c38
READ of size 1 at 0x00016bbf0d37 thread T0
==7993==WARNING: failed to spawn external symbolizer (errno: 0)
==7993==WARNING: failed to spawn external symbolizer (errno: 0)
==7993==WARNING: failed to spawn external symbolizer (errno: 0)
==7993==WARNING: failed to spawn external symbolizer (errno: 0)
==7993==WARNING: failed to spawn external symbolizer (errno: 0)
==7993==WARNING: Failed to use and restart external symbolizer!
    #0 0x106e71b6c in std::__1::__function::__func<ml::model::CHierarchicalResultsNormalizer::isMemberOfPopulation(ml::model::hierarchical_results_detail::SNode const&, std::__1::function<bool (ml::model::hierarchical_results_detail::SNode const&)>)::$_0, std::__1::allocator<ml::model::CHierarchicalResultsNormalizer::isMemberOfPopulation(ml::model::hierarchical_results_detail::SNode const&, std::__1::function<bool (ml::model::hierarchical_results_detail::SNode const&)>)::$_0>, bool (ml::model::hierarchical_results_detail::SNode const&)>::operator()(ml::model::hierarchical_results_detail::SNode const&)+0x3c0 (/Users/eds/src/elasticsearch/ml-cpp/build/distribution/platform/darwin-aarch64/controller.app/Contents/lib/libMlModel.dylib:arm64+0x555b6c)
    #1 0x106e1cd34 in ml::model::CHierarchicalResultsNormalizer::isMemberOfPopulation(ml::model::hierarchical_results_detail::SNode const&, std::__1::function<bool (ml::model::hierarchical_results_detail::SNode const&)>)+0x390 (/Users/eds/src/elasticsearch/ml-cpp/build/distribution/platform/darwin-aarch64/controller.app/Contents/lib/libMlModel.dylib:arm64+0x500d34)
    #2 0x106e17e90 in ml::model::CHierarchicalResultsNormalizer::visit(ml::model::CHierarchicalResults const&, ml::model::hierarchical_results_detail::SNode const&, bool)+0x384 (/Users/eds/src/elasticsearch/ml-cpp/build/distribution/platform/darwin-aarch64/controller.app/Contents/lib/libMlModel.dylib:arm64+0x4fbe90)
    #3 0x106dc8c68 in ml::model::CHierarchicalResults::pivotsBottomUpBreadthFirst(ml::model::CHierarchicalResultsVisitor&) const+0x8c (/Users/eds/src/elasticsearch/ml-cpp/build/distribution/platform/darwin-aarch64/controller.app/Contents/lib/libMlModel.dylib:arm64+0x4acc68)
    #4 0x104e6aed8 in ml::api::CAnomalyJob::updateNormalizerAndNormalizeResults(bool, ml::model::CHierarchicalResults&)+0x3c (/Users/eds/src/elasticsearch/ml-cpp/build/distribution/platform/darwin-aarch64/controller.app/Contents/lib/libMlApi.dylib:arm64+0x42ed8)
    #5 0x104e616d8 in ml::api::CAnomalyJob::outputResults(long)+0xbe4 (/Users/eds/src/elasticsearch/ml-cpp/build/distribution/platform/darwin-aarch64/controller.app/Contents/lib/libMlApi.dylib:arm64+0x396d8)
    #6 0x104e3b4f0 in ml::api::CAnomalyJob::outputBucketResultsUntil(long)+0x724 (/Users/eds/src/elasticsearch/ml-cpp/build/distribution/platform/darwin-aarch64/controller.app/Contents/lib/libMlApi.dylib:arm64+0x134f0)
    #7 0x104e33c60 in ml::api::CAnomalyJob::handleRecord(boost::unordered::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, boost::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>> const&, std::__1::optional<long>)+0x93c (/Users/eds/src/elasticsearch/ml-cpp/build/distribution/platform/darwin-aarch64/controller.app/Contents/lib/libMlApi.dylib:arm64+0xbc60)
    #8 0x10511d3c4 in ml::api::CLengthEncodedInputParser::readStreamIntoMaps(std::__1::function<bool (boost::unordered::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, boost::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>> const&)> const&, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&)> const&)+0x468 (/Users/eds/src/elasticsearch/ml-cpp/build/distribution/platform/darwin-aarch64/controller.app/Contents/lib/libMlApi.dylib:arm64+0x2f53c4)
    #9 0x104f3084c in ml::api::CCmdSkeleton::ioLoop()+0xa78 (/Users/eds/src/elasticsearch/ml-cpp/build/distribution/platform/darwin-aarch64/controller.app/Contents/lib/libMlApi.dylib:arm64+0x10884c)
    #10 0x104211388 in main+0x498c (/Users/eds/src/elasticsearch/ml-cpp/build/distribution/platform/darwin-aarch64/controller.app/Contents/MacOS/autodetect:arm64+0x100009388)
    #11 0x18b57e0dc  (<unknown module>)

Address 0x00016bbf0d37 is located in stack of thread T0 at offset 119 in frame
    #0 0x106e1c9b0 in ml::model::CHierarchicalResultsNormalizer::isMemberOfPopulation(ml::model::hierarchical_results_detail::SNode const&, std::__1::function<bool (ml::model::hierarchical_results_detail::SNode const&)>)+0xc (/Users/eds/src/elasticsearch/ml-cpp/build/distribution/platform/darwin-aarch64/controller.app/Contents/lib/libMlModel.dylib:arm64+0x5009b0)

  This frame has 4 object(s):
    [32, 64) 'ref.tmp.i'
    [96, 120) 'ref.tmp' (line 385) <== Memory access at offset 119 is inside this variable
    [160, 184) 'ref.tmp3' (line 386)
    [224, 256) 'agg.tmp'
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope (/Users/eds/src/elasticsearch/ml-cpp/build/distribution/platform/darwin-aarch64/controller.app/Contents/lib/libMlModel.dylib:arm64+0x555b6c) in std::__1::__function::__func<ml::model::CHierarchicalResultsNormalizer::isMemberOfPopulation(ml::model::hierarchical_results_detail::SNode const&, std::__1::function<bool (ml::model::hierarchical_results_detail::SNode const&)>)::$_0, std::__1::allocator<ml::model::CHierarchicalResultsNormalizer::isMemberOfPopulation(ml::model::hierarchical_results_detail::SNode const&, std::__1::function<bool (ml::model::hierarchical_results_detail::SNode const&)>)::$_0>, bool (ml::model::hierarchical_results_detail::SNode const&)>::operator()(ml::model::hierarchical_results_detail::SNode const&)+0x3c0
Shadow bytes around the buggy address:
  0x00016bbf0a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00016bbf0b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00016bbf0b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00016bbf0c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00016bbf0c80: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 f8 f8 f8 f8
=>0x00016bbf0d00: f2 f2 f2 f2 f8 f8[f8]f2 f2 f2 f2 f2 f8 f8 f8 f2
  0x00016bbf0d80: f2 f2 f2 f2 00 00 00 00 f3 f3 f3 f3 00 00 00 00
  0x00016bbf0e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00016bbf0e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00016bbf0f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00016bbf0f80: f1 f1 f1 f1 f8 f2 f2 f2 f8 f8 f8 f2 f2 f2 f2 f2
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==7993==ABORTING