elastic / ml-cpp

Machine learning C++ code
Other
7 stars 62 forks source link

[NLP] Add memory benchmark mode to evaluate.py #2545

Closed davidkyle closed 1 year ago

davidkyle commented 1 year ago

The script uses the functionality added in #2487 to get the max RSS memory usage from the pytorch_inference process. The benchmark sends batches of inferences to be evaluated followed by a get memory usage request and prints a summary of memory usage vs batch size.

The only complication is that pytorch_inference handles control messages on the main thread while model evaluation is off-loaded to a thread pool. To ensure that the inference requests and get memory request are processed sequentially and in order I added an --useImmediateExecutor flag to pytorch_inference. When set the immediate executor is used to process inference requests. This option should only be used for benchmarking.