State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
Related to Model/Framework(s)Kaldi/Speechrecognition
Describe the bug
There seems to be a memory leak in the Kaldi triton backend since the memory usage progressively grows with each inference perform
To Reproduce
Steps to reproduce the behavior:
Launch the server with ./scripts/docker/launch_server.sh
Note the memory usage, idling at ~5GB
Launch the demo client with ./scripts/docker/launch_client.sh -p
Watch the memory usage rise to ~18G and never return back even after the client has exited
Run the client 4-5 times again, and notice the ram usage rise to ~19.5G
With a custom client based on the demo client, running an inference for a ~200MB wav file repeatedly, the memory usage rises as follows (for the Nth run). Takes around 3 minutes to get an inference:
Expected behavior
Memory usage returns to normal after completing a bulk inference. This behaviour seems to be influenced by the max_active argument. Setting it to a lower value makes the memory usage cap out at a lower RAM usage than the default.
Reading the description of max-active, max-active: at the end of each frame computation, we keep only its best max-active tokens (arc instantiations). It seems as if Kaldi keeps max-active tokens from every frame computation (the processing of each chunk), but this memory doesn't seem to get freed even after all the chunks corresponding to a given correlation ID have been processed. Is this intentional? I might be misunderstanding here but this is just what I could make out from the docs.
Related to Model/Framework(s) Kaldi/Speechrecognition
Describe the bug There seems to be a memory leak in the Kaldi triton backend since the memory usage progressively grows with each inference perform
To Reproduce Steps to reproduce the behavior:
./scripts/docker/launch_server.sh
./scripts/docker/launch_client.sh -p
With a custom client based on the demo client, running an inference for a ~200MB wav file repeatedly, the memory usage rises as follows (for the Nth run). Takes around 3 minutes to get an inference:
Expected behavior Memory usage returns to normal after completing a bulk inference. This behaviour seems to be influenced by the
max_active
argument. Setting it to a lower value makes the memory usage cap out at a lower RAM usage than the default.Reading the description of
max-active
,max-active: at the end of each frame computation, we keep only its best max-active tokens (arc instantiations)
. It seems as if Kaldi keepsmax-active
tokens from every frame computation (the processing of each chunk), but this memory doesn't seem to get freed even after all the chunks corresponding to a given correlation ID have been processed. Is this intentional? I might be misunderstanding here but this is just what I could make out from the docs.Environment Please provide at least: