[Kaldi/Triton] cudaError_t 701 : "too many resources requested for launch" returned from 'cudaGetLastError()'

basicasicmatrix commented 3 years ago

Related to Kaldi example - LibriSpeech Model Core Dump using default configuration of 20.03 Kaldi and 20.03 Triton, as outlined here

Have tried on two separate systems, once with a 1080 and again with a P100. Have tried altering config.pbtxt with many variations, no change in behavior.

**ERROR ([5.5]:splice_features_batched():feature-online-batched-ivector-cuda-kernels.cu:223) cudaError_t 701 : "too many resources requested for launch" returned from 'cudaGetLastError()'**

[ Stack-Trace: ]
/opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0xb42) [0x7fe9008ea652]
/workspace/model-repo/kaldi_online/1/libkaldi-trtisbackend.so(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x2e) [0x7fe90c644952]
/opt/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::splice_features_batched(int, int, int, int, float const*, int, int, float const*, int, int, float*, int, int, kaldi::LaneDesc const*, int)+0x1fb) [0x7fe8ea51f53c]
/opt/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::BatchedIvectorExtractorCuda::SpliceFeats(kaldi::CuMatrixBase<float> const&, kaldi::CuMatrix<float> const&, kaldi::CuMatrix<float>*, kaldi::LaneDesc const*, int)+0x62) [0x7fe8ea51bea0]
/opt/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::BatchedIvectorExtractorCuda::GetIvectors(kaldi::CuMatrixBase<float> const&, kaldi::CuVectorBase<float>*, kaldi::LaneDesc const*, int)+0x72) [0x7fe8ea51c996]
/opt/kaldi/src/lib/libkaldi-cudafeat.so(kaldi::OnlineBatchedFeaturePipelineCuda::ComputeFeaturesBatched(int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<bool, std::allocator<bool> > const&, float, kaldi::CuMatrixBase<float> const&, kaldi::CuMatrix<float>*, kaldi::CuVector<float>*, std::vector<int, std::allocator<int> >*)+0x3c1) [0x7fe8ea521167]
/opt/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaOnlinePipeline::ComputeGPUFeatureExtraction(std::vector<int, std::allocator<int> > const&, std::vector<kaldi::SubVector<float>, std::allocator<kaldi::SubVector<float> > > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<bool, std::allocator<bool> > const&)+0x1ba) [0x7fe90c0ab31c]
/opt/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaOnlinePipeline::DecodeBatch(std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<kaldi::SubVector<float>, std::allocator<kaldi::SubVector<float> > > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<bool, std::allocator<bool> > const&)+0xca) [0x7fe90c0ac8e0]
/workspace/model-repo/kaldi_online/1/libkaldi-trtisbackend.so(nvidia::inferenceserver::custom::kaldi_cbe::Context::FlushBatch()+0x74) [0x7fe90c642c00]
/workspace/model-repo/kaldi_online/1/libkaldi-trtisbackend.so(nvidia::inferenceserver::custom::kaldi_cbe::Context::Execute(unsigned int, custom_payload_struct*, bool (*)(void*, char const*, void const**, unsigned long*), bool (*)(void*, char const*, unsigned long, long*, unsigned long, void**))+0x3f0) [0x7fe90c642b22]
/workspace/model-repo/kaldi_online/1/libkaldi-trtisbackend.so(CustomExecute+0x4f) [0x7fe90c643db2]
/opt/tensorrtserver/bin/../lib/libtrtserver.so(+0x2ada7c) [0x7fea04c85a7c]
/opt/tensorrtserver/bin/../lib/libtrtserver.so(+0x94617) [0x7fea04a6c617]
/opt/tensorrtserver/bin/../lib/libtrtserver.so(+0x2a99f2) [0x7fea04c819f2]
/opt/tensorrtserver/bin/../lib/libtrtserver.so(+0xae071) [0x7fea04a86071]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbd66f) [0x7fea03ec466f]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7fea047c06db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7fea0358188f]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
/opt/trtis-kaldi/nvidia_kaldi_trtis_entrypoint.sh: line 22:    18 Aborted                 (core dumped) /opt/tensorrtserver/nvidia_entrypoint.sh $@

To Reproduce Steps to reproduce the behavior:

Install latest CUDA drivers(450.80.02)
Follow instructions line by line: https://developer.nvidia.com/blog/integrating-nvidia-triton-inference-server-with-kaldi-asr/
Core dump upon attempted inference with included client test (even one iteration, with many GBs of free memory on GPU)

Expected behavior

Inference results. No core dump.

Environment Please provide at least:

Container version: 20.03
GPUs in the system: Tesla P100 16GB
CUDA driver version: 450.80.02

basicasicmatrix commented 3 years ago

@nv-kkudrynski

basicasicmatrix commented 3 years ago

https://github.com/NVIDIA/DeepLearningExamples/commit/a2281e37d5148c9b1db4f28ab2e9d16a8a79cd12

This commit works as intended (nvcr.io/nvidia/kaldi:19.12-online-beta)

gavinljj commented 3 years ago

I have some issues

NVIDIA / DeepLearningExamples

[Kaldi/Triton] cudaError_t 701 : "too many resources requested for launch" returned from 'cudaGetLastError()' #779