Open DreamGenX opened 3 months ago
Hi @MartinMarciniszyn would u plz take a look this ticket?
Hi @DreamGenX, have you tried using the new Executor API instead? Requests of this API will serialize the name of the associated logits post-processor correctly.
Hi @MartinMarciniszyn, thanks for your time. Comment on that below, but I would say that adding documentation and runtime warning for batch manager's InferenceRequest::serialize
still makes sense, regardless of the viability of Executor API.
Regarding Executor API:
Executor API is unfortunately limited as far as I can tell -- I don't see a way to have "dynamic" post-processors where the post-processors differ per request and all possible valuas aren't know apriori (so can't be registred in the map).
With the executor API, since all post-processors need to be pre-registered, there's no away to achieve this as far as I can tell. We can't even store the state in some external storage keyed by requestId
, since the id isn't know before calling enqueueRequests
.
Possible solution: Let LogitPostProcessor
access arbitrary std::vector<uint8_t> state
that we provide when building the Request
/ SamplingParams
, there we can encode additional per-request state for the logit post processor. A more flexible version would be to actually have LogitPostProcessorFactory
that returns LogitPostProcessor
based on the state
, as it could avoid expensive pre-processing on each token in some cases.
This is the main reason I have essentially re-implemented the Executor C++ API myself (the code is not open-source, so I could not just modify it either). So if the Executor API can be made more flexible, I would love it.
See also here: https://github.com/NVIDIA/TensorRT-LLM/issues/1680#issuecomment-2137826264
This is the main reason I have essentially re-implemented the Executor C++ API myself (the code is not open-source, so I could not just modify it either). So if the Executor API can be made more flexible, I would love it.
@DreamGenX
Hi, do you have any plans to open source the your own "Executor"?
Hi @akhoroshev -- I would hope it soon becomes obsolete once the Executor API becomes more mature, but if not I don't mind releasing it.
Hi @akhoroshev -- I would hope it soon becomes obsolete once the Executor API becomes more mature, but if not I don't mind releasing it.
@DreamGenX
The trtllm team's core idea of not opening the source code (Executor/BatchManager) makes it very difficult to customize functionality.
There is a list of features that cannot be implemented due to closed sources.
Therefore, I would be very grateful if you could share the sources of your executor.
@MartinMarciniszyn can you let us know if you plan to expose more flexible API for logit post processors in the Executor?
Thank you for your feedback, @DreamGenX.
I agree that there should an error message for InferenceRequest::serialize
. We will look into it.
Serializing arbitrary function callbacks between the different ranks poses technical difficulties. The team will investigate how we can add more flexibility to the API given these restrictions.
@MartinMarciniszyn Yeah, of course, serialization of arbitrary callbacks is not feasible, that's why I propose that we can attach arbitrary binary data as extension to the Request / SamplingParams, and then get access to this data in the LogitPostProcessor. It would be even better if we can have LogitPostProcessorFactory, which gets called at the start of the request and can do all the pre-processing work (e.g. deserialization of the data, etc.) only once, instead of on every token.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
System Info
When using C++ GptManager, I was using InferenceRequest::serialize to synchronize requests between ranks, not realizing that it does not handle logits post processors.
It would be great to log an error in InferenceRequest::serialize when logits post processor is not nullopt. Or, at the very least, document it in code and the readme
Here are some issues this resulted in:
Who can help?
@ju
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
N/A
Expected behavior
N/A
actual behavior
N/A
additional notes
N/A