Open ydm-amazon opened 3 months ago
max_draft_len
parameter is added to main branch in the code freeze period of v0.8.0 release, it is not a feature of v0.8.0 release. If it is needed, main branch can be used or wait for later v0.9.0 release.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
System Info
TensorRT-LLM v0.8.0 branch https://github.com/NVIDIA/TensorRT-LLM/blob/v0.8.0/tensorrt_llm/commands/build.py versus main branch https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/commands/build.py
Who can help?
@ncomly
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
The max_draft_len parameter is necessary when building the model for speculative decoding. In the main branch, the code to accept this arg is there in lines 122 to 128:
However, this support for max_draft_len is absent in the version tagged v0.8.0 - not sure if it was accidentally missed before the v0.8.0 release. Could it be added to the v0.8.0 version?
Expected behavior
See above
actual behavior
See above
additional notes
N/A