script to continuously evaluate elser

First download the elser model locally. Either

https://ml-models.elastic.co/elser_model_2.pt
https://ml-models.elastic.co/elser_model_2_linux-x86_64.pt

The script runs pytorch_inference, loads the model then continuously runs inference on it. Logging is to std out, the model output is written to a json file. Every 100 request the script asks the pytorch_inference how much memory it is using and this is written to the same json file. grep mem out.json will show that data.

Run with

python3 signal9.py '/PATH/TO/elser_2/elser_model_2.pt' --num_allocations=4

--num_threads_per_allocation and --num_allocations are the parameters to tweak. Increasing either of those will make inference faster and changes in memory should be seen sooner.

elastic / ml-cpp

script to continuously evaluate elser #2670