bytedance / lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation
Other
3.13k stars 324 forks source link

[Triton] Multilingual Transformer Fails on 2nd request #433

Open Csinclair0 opened 1 year ago

Csinclair0 commented 1 year ago

I am trying to serve a multilingual transformer on triton. The server is able to process the first request, but the second fails. It seems to first have an issue receiving the request. In my client script, I am sending the same request of

[0, 1, 160001, 7286, 3026, 1710, 374, 23, 3026, 2, 0]

As you can see in the logs, it is properly received the first time.

batch_size-1 batch_seq_len-10
batch_token_ids: 160001, 7286, 3026, 1710, 374, 23, 3026, 2, 0, 0, 

However, the second time it seems to skip the first two inputs as it logs

batch_size-1 batch_seq_len-10
batch_token_ids: 3026, 1710, 374, 23, 3026, 2, 0, 0, 0, 0, 

and it will then run into an error.

emb out: token-0
emb out: terminate called after throwing an instance of 'std::runtime_error'
  what():  [CUDA][ERROR] /opt/lightseq/lightseq/inference/tools/util.cc.cu(66): cudaErrorIllegalAddressan illegal memory access was encountered

Signal (6) received.
 0# 0x000055DD62122EB9 in tritonserver
 1# 0x00007FB7FBCAF210 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# gsignal in /usr/lib/x86_64-linux-gnu/libc.so.6
 3# abort in /usr/lib/x86_64-linux-gnu/libc.so.6
 4# 0x00007FB7FC065911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 5# 0x00007FB7FC07138C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 6# 0x00007FB7FC0713F7 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 7# 0x00007FB7FC0716A9 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 8# void lightseq::cuda::check_gpu_error<cudaError>(cudaError, char const*, char const*, int) in /opt/tritonserver/lib/libliblightseq.so
 9# void lightseq::cuda::print_vec<__half>(__half const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) in /opt/tritonserver/lib/libliblightseq.so
10# lightseq::cuda::Encoder<(lightseq::cuda::OperationType)1>::run_one_infer(int, int) in /opt/tritonserver/lib/libliblightseq.so
11# lightseq::cuda::Transformer::Infer() in /opt/tritonserver/lib/libliblightseq.so
12# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/lightseq/libtriton_lightseq.so
13# 0x00007FB7FC83710A in /opt/tritonserver/lib/libtritonserver.so
14# 0x00007FB7FC8379B7 in /opt/tritonserver/lib/libtritonserver.so
15# 0x00007FB7FC6E33C1 in /opt/tritonserver/lib/libtritonserver.so
16# 0x00007FB7FC830F87 in /opt/tritonserver/lib/libtritonserver.so
17# 0x00007FB7FC09DDE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
18# 0x00007FB7FC51B609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0
19# clone in /usr/lib/x86_64-linux-gnu/libc.so.6

Signal (11) received.
 0# 0x000055DD62122EB9 in tritonserver
 1# 0x00007FB7FBCAF210 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# abort in /usr/lib/x86_64-linux-gnu/libc.so.6
 3# 0x00007FB7FC065911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 4# 0x00007FB7FC07138C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 5# 0x00007FB7FC0713F7 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 6# 0x00007FB7FC0716A9 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 7# void lightseq::cuda::check_gpu_error<cudaError>(cudaError, char const*, char const*, int) in /opt/tritonserver/lib/libliblightseq.so
 8# void lightseq::cuda::print_vec<__half>(__half const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) in /opt/tritonserver/lib/libliblightseq.so
 9# lightseq::cuda::Encoder<(lightseq::cuda::OperationType)1>::run_one_infer(int, int) in /opt/tritonserver/lib/libliblightseq.so
10# lightseq::cuda::Transformer::Infer() in /opt/tritonserver/lib/libliblightseq.so
11# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/lightseq/libtriton_lightseq.so
12# 0x00007FB7FC83710A in /opt/tritonserver/lib/libtritonserver.so
13# 0x00007FB7FC8379B7 in /opt/tritonserver/lib/libtritonserver.so
14# 0x00007FB7FC6E33C1 in /opt/tritonserver/lib/libtritonserver.so
15# 0x00007FB7FC830F87 in /opt/tritonserver/lib/libtritonserver.so
16# 0x00007FB7FC09DDE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
17# 0x00007FB7FC51B609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0
18# clone in /usr/lib/x86_64-linux-gnu/libc.so.6

multilingual_logs.txt

hexisyztem commented 1 year ago

Did anything change between requests? I have recently tested the triton server of the master branch, and it seems to be performing normally

hexisyztem commented 1 year ago

I have other high-priority work being done recently, and I will consider doing a complete test verification on the image of the triton server in a day or two

Csinclair0 commented 1 year ago

No difference in requests, I am just running the same client script twice in a row. I used the Dockerfile in the repo to build the image, and only change made was to enable debug mode.

Hope you get a chance to do a complete test verification, would be greatly appreciated. I'm not sure if you also have any example models for multilingual (or MoE, not this model but others I am working on) either, as that might also help in triaging some issues.

hexisyztem commented 1 year ago

Sorry, I haven't been able to test out why this kind of anomaly occurs, it's all normal here. But I found a similar open source community issue before. This kind of problem is because all model instances are loaded on the same GPU device. It was originally expected that each GPU device would load one model instance. Later, their solution was to only see one GPU device in the docker container, and then perform load scheduling through nginx on the upper layer.

Csinclair0 commented 1 year ago

Think this is more relevant to 427, as this issue is from using a single GPU but a multilingual model.

hexisyztem commented 1 year ago

Does it mean that the first call is normal, but an error is reported when the second call is made?

From: "Colin @.> Date: Thu, Dec 15, 2022, 01:12 Subject: [External] Re: [bytedance/lightseq] [Triton] Multilingual Transformer Fails on 2nd request (Issue #433) To: @.> Cc: @.>, "Comment"< @.>

Think this is more relevant to 427 https://github.com/bytedance/lightseq/issues/427, as this issue is from using a single GPU but a multilingual model.

— Reply to this email directly, view it on GitHub https://github.com/bytedance/lightseq/issues/433#issuecomment-1351791164, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGAOAODF5P2UDHNIN2DRMMTWNH5XDANCNFSM6AAAAAASQEQLZQ . You are receiving this because you commented.Message ID: @.***>

hexisyztem commented 1 year ago

How are you using tritonbackend? Is it directly using the image published on docker hub? Or compile it yourself? What is the compilation environment? Sorry for asking so many questions, because I am deploying and using it in an internal environment, and everything behaves normally, and I have not been able to reproduce the cause of the problem. I may need to attribute it to a specific error environment, and then perform a debug check. Thanks!

From: @.> Date: Thu, Dec 15, 2022, 01:15 Subject: Re: [External] Re: [bytedance/lightseq] [Triton] Multilingual Transformer Fails on 2nd request (Issue #433) To: "bytedance/lightseq"< @.> Does it mean that the first call is normal, but an error is reported when the second call is made?

From: "Colin @.> Date: Thu, Dec 15, 2022, 01:12 Subject: [External] Re: [bytedance/lightseq] [Triton] Multilingual Transformer Fails on 2nd request (Issue #433) To: @.> Cc: @.>, "Comment"< @.>

Think this is more relevant to 427 https://github.com/bytedance/lightseq/issues/427, as this issue is from using a single GPU but a multilingual model.

— Reply to this email directly, view it on GitHub https://github.com/bytedance/lightseq/issues/433#issuecomment-1351791164, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGAOAODF5P2UDHNIN2DRMMTWNH5XDANCNFSM6AAAAAASQEQLZQ . You are receiving this because you commented.Message ID: @.***>

hexisyztem commented 1 year ago

I'm sorry I didn't see your previous message. I probably need to apply for some machine resources from the company to fully deploy the external mirror environment for testing. This is used internally and does not have the problems mentioned above.

From: @.> Date: Thu, Dec 15, 2022, 01:19 Subject: Re: [External] Re: [bytedance/lightseq] [Triton] Multilingual Transformer Fails on 2nd request (Issue #433) To: "bytedance/lightseq"< @.> How are you using tritonbackend? Is it directly using the image published on docker hub? Or compile it yourself? What is the compilation environment? Sorry for asking so many questions, because I am deploying and using it in an internal environment, and everything behaves normally, and I have not been able to reproduce the cause of the problem. I may need to attribute it to a specific error environment, and then perform a debug check. Thanks!

From: @.> Date: Thu, Dec 15, 2022, 01:15 Subject: Re: [External] Re: [bytedance/lightseq] [Triton] Multilingual Transformer Fails on 2nd request (Issue #433) To: "bytedance/lightseq"< @.> Does it mean that the first call is normal, but an error is reported when the second call is made?

From: "Colin @.> Date: Thu, Dec 15, 2022, 01:12 Subject: [External] Re: [bytedance/lightseq] [Triton] Multilingual Transformer Fails on 2nd request (Issue #433) To: @.> Cc: @.>, "Comment"< @.>

Think this is more relevant to 427 https://github.com/bytedance/lightseq/issues/427, as this issue is from using a single GPU but a multilingual model.

— Reply to this email directly, view it on GitHub https://github.com/bytedance/lightseq/issues/433#issuecomment-1351791164, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGAOAODF5P2UDHNIN2DRMMTWNH5XDANCNFSM6AAAAAASQEQLZQ . You are receiving this because you commented.Message ID: @.***>

Csinclair0 commented 1 year ago

yes, the first request is successful but the second fails. I am running this on an AWS Machine (ml.g4dn.12xlarge), and building the image using the docker file in the repo. Only modification is enabling debug mode. I just tried using the image from docker hub and ran into the same issue, although a different location. This is the second request after the first was successful.

terminate called after throwing an instance of 'std::runtime_error'
  what():  [CUDA][ERROR] /opt/lightseq/lightseq/inference/model/decoder.cc.cu(349): CUBLAS_STATUS_EXECUTION_FAILED

Signal (6) received.
 0# 0x00005647D2E01EB9 in tritonserver
 1# 0x00007F1332FFA210 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# gsignal in /usr/lib/x86_64-linux-gnu/libc.so.6
 3# abort in /usr/lib/x86_64-linux-gnu/libc.so.6
 4# 0x00007F13333B0911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 5# 0x00007F13333BC38C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 6# 0x00007F13333BC3F7 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 7# 0x00007F13333BC6A9 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 8# void lightseq::cuda::check_gpu_error<cublasStatus_t>(cublasStatus_t, char const*, char const*, int) in /opt/tritonserver/lib/libliblightseq.so
 9# lightseq::cuda::Decoder<(lightseq::cuda::OperationType)1>::project_encoder_output() in /opt/tritonserver/lib/libliblightseq.so
10# lightseq::cuda::Decoder<(lightseq::cuda::OperationType)1>::run_one_infer(int, int) in /opt/tritonserver/lib/libliblightseq.so
11# lightseq::cuda::Transformer::Infer() in /opt/tritonserver/lib/libliblightseq.so
12# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/lightseq/libtriton_lightseq.so
13# 0x00007F1333B8210A in /opt/tritonserver/lib/libtritonserver.so
14# 0x00007F1333B829B7 in /opt/tritonserver/lib/libtritonserver.so
15# 0x00007F1333A2E3C1 in /opt/tritonserver/lib/libtritonserver.so
16# 0x00007F1333B7BF87 in /opt/tritonserver/lib/libtritonserver.so
17# 0x00007F13333E8DE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
18# 0x00007F1333866609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0
19# clone in /usr/lib/x86_64-linux-gnu/libc.so.6

Signal (11) received.
 0# 0x00005647D2E01EB9 in tritonserver
 1# 0x00007F1332FFA210 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# abort in /usr/lib/x86_64-linux-gnu/libc.so.6
 3# 0x00007F13333B0911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 4# 0x00007F13333BC38C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 5# 0x00007F13333BC3F7 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 6# 0x00007F13333BC6A9 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 7# void lightseq::cuda::check_gpu_error<cublasStatus_t>(cublasStatus_t, char const*, char const*, int) in /opt/tritonserver/lib/libliblightseq.so
 8# lightseq::cuda::Decoder<(lightseq::cuda::OperationType)1>::project_encoder_output() in /opt/tritonserver/lib/libliblightseq.so
 9# lightseq::cuda::Decoder<(lightseq::cuda::OperationType)1>::run_one_infer(int, int) in /opt/tritonserver/lib/libliblightseq.so
10# lightseq::cuda::Transformer::Infer() in /opt/tritonserver/lib/libliblightseq.so
11# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/lightseq/libtriton_lightseq.so
12# 0x00007F1333B8210A in /opt/tritonserver/lib/libtritonserver.so
13# 0x00007F1333B829B7 in /opt/tritonserver/lib/libtritonserver.so
14# 0x00007F1333A2E3C1 in /opt/tritonserver/lib/libtritonserver.so
15# 0x00007F1333B7BF87 in /opt/tritonserver/lib/libtritonserver.so
16# 0x00007F13333E8DE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
17# 0x00007F1333866609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0
18# clone in /usr/lib/x86_64-linux-gnu/libc.so.6