NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
12.95k stars 3.12k forks source link

[Kaldi/Speechrecognition] Potential memory "leak"? #1240

Open git-bruh opened 1 year ago

git-bruh commented 1 year ago

Related to Model/Framework(s) Kaldi/Speechrecognition

Describe the bug There seems to be a memory leak in the Kaldi triton backend since the memory usage progressively grows with each inference perform

To Reproduce Steps to reproduce the behavior:

  1. Launch the server with ./scripts/docker/launch_server.sh
  2. Note the memory usage, idling at ~5GB
  3. Launch the demo client with ./scripts/docker/launch_client.sh -p
  4. Watch the memory usage rise to ~18G and never return back even after the client has exited
  5. Run the client 4-5 times again, and notice the ram usage rise to ~19.5G

With a custom client based on the demo client, running an inference for a ~200MB wav file repeatedly, the memory usage rises as follows (for the Nth run). Takes around 3 minutes to get an inference:

1.mem: 6.9Gi
2.mem: 7.6Gi
3.mem: 8.3Gi
4.mem: 9.1Gi
5.mem: 9.8Gi
6.mem: 10Gi
7.mem: 11Gi
8.mem: 11Gi
9.mem: 12Gi
10.mem: 13Gi
11.mem: 14Gi
12.mem: 14Gi
13.mem: 15Gi
14.mem: 16Gi
15.mem: 16Gi
16.mem: 17Gi
17.mem: 18Gi
18.mem: 18Gi
19.mem: 19Gi
20.mem: 20Gi
21.mem: 20Gi
22.mem: 20Gi
23.mem: 20Gi
24.mem: 20Gi
25.mem: 20Gi
26.mem: 21Gi
27.mem: 21Gi
28.mem: 22Gi
29.mem: 22Gi
30.mem: 22Gi
31.mem: 22Gi
32.mem: 23Gi
33.mem: 24Gi
34.mem: 25Gi
35.mem: 25Gi
36.mem: 25Gi
37.mem: 26Gi
38.mem: 26Gi
39.mem: 27Gi
40.mem: 27Gi

Expected behavior Memory usage returns to normal after completing a bulk inference. This behaviour seems to be influenced by the max_active argument. Setting it to a lower value makes the memory usage cap out at a lower RAM usage than the default.

Reading the description of max-active, max-active: at the end of each frame computation, we keep only its best max-active tokens (arc instantiations). It seems as if Kaldi keeps max-active tokens from every frame computation (the processing of each chunk), but this memory doesn't seem to get freed even after all the chunks corresponding to a given correlation ID have been processed. Is this intentional? I might be misunderstanding here but this is just what I could make out from the docs.

Environment Please provide at least:

git-bruh commented 1 year ago

https://github.com/NVIDIA/DeepLearningExamples/issues/795 ..