v0.8.0 tag trtllm-build does not accept max_draft_len arg

ydm-amazon commented 3 months ago

System Info

TensorRT-LLM v0.8.0 branch https://github.com/NVIDIA/TensorRT-LLM/blob/v0.8.0/tensorrt_llm/commands/build.py versus main branch https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/commands/build.py

Who can help?

@ncomly

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

The max_draft_len parameter is necessary when building the model for speculative decoding. In the main branch, the code to accept this arg is there in lines 122 to 128:

parser.add_argument(
        '--max_draft_len',
        type=int,
        default=0,
        help=
        'Maximum lengths of draft tokens for speculative decoding target model.'
    )

However, this support for max_draft_len is absent in the version tagged v0.8.0 - not sure if it was accidentally missed before the v0.8.0 release. Could it be added to the v0.8.0 version?

Expected behavior

See above

actual behavior

See above

additional notes

N/A

dongxuy04 commented 3 months ago

max_draft_len parameter is added to main branch in the code freeze period of v0.8.0 release, it is not a feature of v0.8.0 release. If it is needed, main branch can be used or wait for later v0.9.0 release.

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

NVIDIA / TensorRT-LLM