NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
13.48k stars 3.21k forks source link

scripts/docker/launch_client.sh hangs in TritonASRClient constructor after outputting "Opening GRPC contextes..." #1023

Closed rick-pettit closed 2 years ago

rick-pettit commented 2 years ago

Related to Integrating NVIDIA Triton Inference Server with Kaldi ASR

Describe the bug Upon following the instructions outlined in https://developer.nvidia.com/blog/integrating-nvidia-triton-inference-server-with-kaldi-asr/ we are able to successfully launch the Triton server with DeepLearningExamples/Kaldi/SpeechRecognition/scripts/docker/launch_server.sh, however when launching the client via DeepLearningExamples/Kaldi/SpeechRecognition/scripts/docker/launch_client.sh, the client outputs "Opening GRPC contextes..." and then hangs, without outputting "done" or "Streaming utterances...".

To Reproduce Steps to reproduce the behavior: Follow the steps outlined in https://developer.nvidia.com/blog/integrating-nvidia-triton-inference-server-with-kaldi-asr/, specifically:

  1. git clone https://github.com/NVIDIA/DeepLearningExamples.git
  2. cd DeepLearningExamples/Kaldi/SpeechRecognition
  3. scripts/docker/build.sh
  4. scripts/docker/launch_download.sh
  5. scripts/docker/launch_server.sh
  6. scripts/docker/launch_client.sh -p -c 1000

The server starts up without a problem, loads the kaldi online model, and outputs "Starting Metrics Service at 0.0.0.0:8002".

However the client hangs after outputting line, "Opening GRPC contextes...". We never see the "done" for that step, nor do we see what appears to be the next expected output of "Streaming utterances..."

It appears as though the client is hanging at kaldi-asr-client/kaldi_asr_parallel_client.cc line 273 calling the TritonASRClient constructor.

Expected behavior The client sends 1000 parallel streams to the server, printing the inferred text sent back from the server.

Environment

rick-pettit commented 2 years ago

The problem appears to have been on our end, a k8s networking issue in our on-prem k8s cluster.

Was able to get the Triton Kaldi ASR client to run successfully against the server after sorting our issues.