Open Csinclair0 opened 1 year ago
Did anything change between requests? I have recently tested the triton server of the master branch, and it seems to be performing normally
I have other high-priority work being done recently, and I will consider doing a complete test verification on the image of the triton server in a day or two
No difference in requests, I am just running the same client script twice in a row. I used the Dockerfile in the repo to build the image, and only change made was to enable debug mode.
Hope you get a chance to do a complete test verification, would be greatly appreciated. I'm not sure if you also have any example models for multilingual (or MoE, not this model but others I am working on) either, as that might also help in triaging some issues.
Sorry, I haven't been able to test out why this kind of anomaly occurs, it's all normal here. But I found a similar open source community issue before. This kind of problem is because all model instances are loaded on the same GPU device. It was originally expected that each GPU device would load one model instance. Later, their solution was to only see one GPU device in the docker container, and then perform load scheduling through nginx on the upper layer.
Think this is more relevant to 427, as this issue is from using a single GPU but a multilingual model.
Does it mean that the first call is normal, but an error is reported when the second call is made?
From: "Colin @.> Date: Thu, Dec 15, 2022, 01:12 Subject: [External] Re: [bytedance/lightseq] [Triton] Multilingual Transformer Fails on 2nd request (Issue #433) To: @.> Cc: @.>, "Comment"< @.>
Think this is more relevant to 427 https://github.com/bytedance/lightseq/issues/427, as this issue is from using a single GPU but a multilingual model.
— Reply to this email directly, view it on GitHub https://github.com/bytedance/lightseq/issues/433#issuecomment-1351791164, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGAOAODF5P2UDHNIN2DRMMTWNH5XDANCNFSM6AAAAAASQEQLZQ . You are receiving this because you commented.Message ID: @.***>
How are you using tritonbackend? Is it directly using the image published on docker hub? Or compile it yourself? What is the compilation environment? Sorry for asking so many questions, because I am deploying and using it in an internal environment, and everything behaves normally, and I have not been able to reproduce the cause of the problem. I may need to attribute it to a specific error environment, and then perform a debug check. Thanks!
From: @.> Date: Thu, Dec 15, 2022, 01:15 Subject: Re: [External] Re: [bytedance/lightseq] [Triton] Multilingual Transformer Fails on 2nd request (Issue #433) To: "bytedance/lightseq"< @.> Does it mean that the first call is normal, but an error is reported when the second call is made?
From: "Colin @.> Date: Thu, Dec 15, 2022, 01:12 Subject: [External] Re: [bytedance/lightseq] [Triton] Multilingual Transformer Fails on 2nd request (Issue #433) To: @.> Cc: @.>, "Comment"< @.>
Think this is more relevant to 427 https://github.com/bytedance/lightseq/issues/427, as this issue is from using a single GPU but a multilingual model.
— Reply to this email directly, view it on GitHub https://github.com/bytedance/lightseq/issues/433#issuecomment-1351791164, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGAOAODF5P2UDHNIN2DRMMTWNH5XDANCNFSM6AAAAAASQEQLZQ . You are receiving this because you commented.Message ID: @.***>
I'm sorry I didn't see your previous message. I probably need to apply for some machine resources from the company to fully deploy the external mirror environment for testing. This is used internally and does not have the problems mentioned above.
From: @.> Date: Thu, Dec 15, 2022, 01:19 Subject: Re: [External] Re: [bytedance/lightseq] [Triton] Multilingual Transformer Fails on 2nd request (Issue #433) To: "bytedance/lightseq"< @.> How are you using tritonbackend? Is it directly using the image published on docker hub? Or compile it yourself? What is the compilation environment? Sorry for asking so many questions, because I am deploying and using it in an internal environment, and everything behaves normally, and I have not been able to reproduce the cause of the problem. I may need to attribute it to a specific error environment, and then perform a debug check. Thanks!
From: @.> Date: Thu, Dec 15, 2022, 01:15 Subject: Re: [External] Re: [bytedance/lightseq] [Triton] Multilingual Transformer Fails on 2nd request (Issue #433) To: "bytedance/lightseq"< @.> Does it mean that the first call is normal, but an error is reported when the second call is made?
From: "Colin @.> Date: Thu, Dec 15, 2022, 01:12 Subject: [External] Re: [bytedance/lightseq] [Triton] Multilingual Transformer Fails on 2nd request (Issue #433) To: @.> Cc: @.>, "Comment"< @.>
Think this is more relevant to 427 https://github.com/bytedance/lightseq/issues/427, as this issue is from using a single GPU but a multilingual model.
— Reply to this email directly, view it on GitHub https://github.com/bytedance/lightseq/issues/433#issuecomment-1351791164, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGAOAODF5P2UDHNIN2DRMMTWNH5XDANCNFSM6AAAAAASQEQLZQ . You are receiving this because you commented.Message ID: @.***>
yes, the first request is successful but the second fails. I am running this on an AWS Machine (ml.g4dn.12xlarge), and building the image using the docker file in the repo. Only modification is enabling debug mode. I just tried using the image from docker hub and ran into the same issue, although a different location. This is the second request after the first was successful.
terminate called after throwing an instance of 'std::runtime_error'
what(): [CUDA][ERROR] /opt/lightseq/lightseq/inference/model/decoder.cc.cu(349): CUBLAS_STATUS_EXECUTION_FAILED
Signal (6) received.
0# 0x00005647D2E01EB9 in tritonserver
1# 0x00007F1332FFA210 in /usr/lib/x86_64-linux-gnu/libc.so.6
2# gsignal in /usr/lib/x86_64-linux-gnu/libc.so.6
3# abort in /usr/lib/x86_64-linux-gnu/libc.so.6
4# 0x00007F13333B0911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
5# 0x00007F13333BC38C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
6# 0x00007F13333BC3F7 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
7# 0x00007F13333BC6A9 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
8# void lightseq::cuda::check_gpu_error<cublasStatus_t>(cublasStatus_t, char const*, char const*, int) in /opt/tritonserver/lib/libliblightseq.so
9# lightseq::cuda::Decoder<(lightseq::cuda::OperationType)1>::project_encoder_output() in /opt/tritonserver/lib/libliblightseq.so
10# lightseq::cuda::Decoder<(lightseq::cuda::OperationType)1>::run_one_infer(int, int) in /opt/tritonserver/lib/libliblightseq.so
11# lightseq::cuda::Transformer::Infer() in /opt/tritonserver/lib/libliblightseq.so
12# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/lightseq/libtriton_lightseq.so
13# 0x00007F1333B8210A in /opt/tritonserver/lib/libtritonserver.so
14# 0x00007F1333B829B7 in /opt/tritonserver/lib/libtritonserver.so
15# 0x00007F1333A2E3C1 in /opt/tritonserver/lib/libtritonserver.so
16# 0x00007F1333B7BF87 in /opt/tritonserver/lib/libtritonserver.so
17# 0x00007F13333E8DE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
18# 0x00007F1333866609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0
19# clone in /usr/lib/x86_64-linux-gnu/libc.so.6
Signal (11) received.
0# 0x00005647D2E01EB9 in tritonserver
1# 0x00007F1332FFA210 in /usr/lib/x86_64-linux-gnu/libc.so.6
2# abort in /usr/lib/x86_64-linux-gnu/libc.so.6
3# 0x00007F13333B0911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
4# 0x00007F13333BC38C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
5# 0x00007F13333BC3F7 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
6# 0x00007F13333BC6A9 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
7# void lightseq::cuda::check_gpu_error<cublasStatus_t>(cublasStatus_t, char const*, char const*, int) in /opt/tritonserver/lib/libliblightseq.so
8# lightseq::cuda::Decoder<(lightseq::cuda::OperationType)1>::project_encoder_output() in /opt/tritonserver/lib/libliblightseq.so
9# lightseq::cuda::Decoder<(lightseq::cuda::OperationType)1>::run_one_infer(int, int) in /opt/tritonserver/lib/libliblightseq.so
10# lightseq::cuda::Transformer::Infer() in /opt/tritonserver/lib/libliblightseq.so
11# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/lightseq/libtriton_lightseq.so
12# 0x00007F1333B8210A in /opt/tritonserver/lib/libtritonserver.so
13# 0x00007F1333B829B7 in /opt/tritonserver/lib/libtritonserver.so
14# 0x00007F1333A2E3C1 in /opt/tritonserver/lib/libtritonserver.so
15# 0x00007F1333B7BF87 in /opt/tritonserver/lib/libtritonserver.so
16# 0x00007F13333E8DE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
17# 0x00007F1333866609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0
18# clone in /usr/lib/x86_64-linux-gnu/libc.so.6
I am trying to serve a multilingual transformer on triton. The server is able to process the first request, but the second fails. It seems to first have an issue receiving the request. In my client script, I am sending the same request of
As you can see in the logs, it is properly received the first time.
However, the second time it seems to skip the first two inputs as it logs
and it will then run into an error.
multilingual_logs.txt