elastic / ml-cpp

Machine learning C++ code
Other
149 stars 62 forks source link

[ML] Use custom Boost::JSON allocator #2674

Closed edsavage closed 2 months ago

edsavage commented 3 months ago

The current code uses the monotonic resource allocator, for allocating memory to boost::json objects, which allocates memory in ever increasing chunks, which can lead to over allocation. The image below shows a typical series of memory allocations when using the monotonic resource allocator

Screenshot 2024-05-29 at 15 07 42

The other disadvantage of the monotonic resource allocator is that no deallocations are performed until the resource allocator is destroyed - hence the name monotonic as resource allocations can only increase during its lifetime.

These factors make the choice of the monotonic resource allocator unsuitable for its current use.

This PR introduces a very simplistic custom allocator that allocates and deallocates individual objects upon request using standard operator ::new and ::delete. This gives a much better experience as only as much memory is allocated at any point in time as absolutely needs to be, and gives a much more predictable memory profile

Screenshot 2024-05-29 at 15 06 02

On small data sets this change appears performant, but I do think it would be wise to run the QA tests against this PR, before merging.

edsavage commented 3 months ago

buildkite run_qa_tests

wwang500 commented 3 months ago

buildkite run_qa_tests

edsavage commented 2 months ago

report the allocator memory usage as part of the memory_stats

Just to clarify @valeriy42 , by memory_stats do you mean the Model size stats (as reported in the counts tab in the AD job results in Kibana)? i.e. include the JSON allocator mem usage in model::CResourceMonitor::SModelSizeStats?

valeriy42 commented 2 months ago

report the allocator memory usage as part of the memory_stats

Just to clarify @valeriy42 , by memory_stats do you mean the Model size stats (as reported in the counts tab in the AD job results in Kibana)? i.e. include the JSON allocator mem usage in model::CResourceMonitor::SModelSizeStats?

Exactly. Sorry for mixing up memory_stats and model_size_stats.

edsavage commented 2 months ago

buildkite build this

DaveCTurner commented 2 months ago

Sorry to say that https://github.com/elastic/elasticsearch/pull/109833 thoroughly breaks the ES wire protocol, I'm going to have to revert it to fix the ES build. I guess that means something needs to be reverted here too, but I'm not qualified to address that.