详细报错信息如下:
raceback (most recent call last):
File "/data/conda_envs/rtp-llm-backup/lib/python3.10/site-packages/maga_transformer/async_decoder_engine/embedding/embedding_engine.py", line 22, in decode_sync
results = self.cpp_engine.decode(inputs.token_ids, inputs.token_type_ids, inputs.input_lengths, 0)
RuntimeError: run stream failed: long prompt error, not scheduled
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/conda_envs/rtp-llm-backup/lib/python3.10/site-packages/maga_transformer/server/inference_server.py", line 174, in embedding
result, logable_result = await self._embedding_endpoint.handle(request)
File "/data/conda_envs/rtp-llm-backup/lib/python3.10/site-packages/maga_transformer/embedding/embedding_endpoint.py", line 20, in handle
batch_output = await self.decoderengine.decode(batch_input)
File "/data/conda_envs/rtp-llm-backup/lib/python3.10/site-packages/maga_transformer/async_decoder_engine/embedding/embedding_engine.py", line 30, in decode
await asyncio.to_thread(self.decode_sync, input, output)
File "/data/conda_envs/rtp-llm-backup/lib/python3.10/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
File "/data/conda_envs/rtp-llm-backup/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/data/conda_envs/rtp-llm-backup/lib/python3.10/site-packages/maga_transformer/async_decoder_engine/embedding/embedding_engine.py", line 26, in decode_sync
raise Exception("failed to run query, error: ", e)
Exception: (\'failed to run query, error: \', RuntimeError(\'run stream failed: long prompt error, not scheduled\'))
现阶段当 queryn+docn的token长度即接口返回的totaltoken超过 maxlength 就会报错
期望 query+ 任一doc的 token长度超过 maxlength 才报错
如图 使用bge reranker m3 maxlength 8192 ,当 total token小于8192时可以生效, 但是当total token大于8192 时就会报错, reranker模型的maxlengt 应当是单次query 和doc 而不是 总和
详细报错信息如下: raceback (most recent call last): File "/data/conda_envs/rtp-llm-backup/lib/python3.10/site-packages/maga_transformer/async_decoder_engine/embedding/embedding_engine.py", line 22, in decode_sync results = self.cpp_engine.decode(inputs.token_ids, inputs.token_type_ids, inputs.input_lengths, 0) RuntimeError: run stream failed: long prompt error, not scheduled
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/data/conda_envs/rtp-llm-backup/lib/python3.10/site-packages/maga_transformer/server/inference_server.py", line 174, in embedding result, logable_result = await self._embedding_endpoint.handle(request) File "/data/conda_envs/rtp-llm-backup/lib/python3.10/site-packages/maga_transformer/embedding/embedding_endpoint.py", line 20, in handle batch_output = await self.decoderengine.decode(batch_input) File "/data/conda_envs/rtp-llm-backup/lib/python3.10/site-packages/maga_transformer/async_decoder_engine/embedding/embedding_engine.py", line 30, in decode await asyncio.to_thread(self.decode_sync, input, output) File "/data/conda_envs/rtp-llm-backup/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/data/conda_envs/rtp-llm-backup/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/data/conda_envs/rtp-llm-backup/lib/python3.10/site-packages/maga_transformer/async_decoder_engine/embedding/embedding_engine.py", line 26, in decode_sync raise Exception("failed to run query, error: ", e) Exception: (\'failed to run query, error: \', RuntimeError(\'run stream failed: long prompt error, not scheduled\'))