elastic / ml-cpp

Machine learning C++ code
Other
149 stars 62 forks source link

script to continuously evaluate elser #2670

Open davidkyle opened 3 months ago

davidkyle commented 3 months ago

First download the elser model locally. Either

The script runs pytorch_inference, loads the model then continuously runs inference on it. Logging is to std out, the model output is written to a json file. Every 100 request the script asks the pytorch_inference how much memory it is using and this is written to the same json file. grep mem out.json will show that data.

Run with

python3 signal9.py '/PATH/TO/elser_2/elser_model_2.pt' --num_allocations=4

--num_threads_per_allocation and --num_allocations are the parameters to tweak. Increasing either of those will make inference faster and changes in memory should be seen sooner.