Open Bhuvanesh09 opened 3 weeks ago
Could you share how do you set/add the beam_search_diversity_rate
? I take a try on llama 7b model and it works well on latest main branch.
python examples/llama/convert_checkpoint.py --model_dir /llama-models/llama-7b-hf/ \
--output_dir /tmp/tllm_checkpoint_1gpu_fp16 \
--dtype float16
python3 -m tensorrt_llm.commands.build --checkpoint_dir /tmp/tllm_checkpoint_1gpu_fp16 \
--output_dir /tmp/tmp/llama/7B/trt_engines/fp16/1-gpu \
--gemm_plugin auto \
--max_beam_width 4
python examples/run.py --engine_dir /tmp/tmp/llama/7B/trt_engines/fp16/1-gpu --max_output_len 10 --use_py_session --tokenizer_dir /llama-models/llama-7b-hf/ --num_beams 4
default (beam_search_diversity_rate = 0)
Input [Text 0]: "<s> Born in north-east France, Soyer trained as a"
Output [Text 0 Beam 0]: "pastry chef before moving to London in 1"
Output [Text 0 Beam 1]: "pastry chef before moving to London in 2"
Output [Text 0 Beam 2]: "pastry chef in Paris before moving to London in"
Output [Text 0 Beam 3]: "pastry chef before moving to the UK in "
beam_search_diversity_rate = 2.0
Input [Text 0]: "<s> Born in north-east France, Soyer trained as a"
Output [Text 0 Beam 0]: "cook before working in restaurants in London and Paris"
Output [Text 0 Beam 1]: "cook before working in restaurants in London, Paris"
Output [Text 0 Beam 2]: "cook before working in restaurants in London, New"
Output [Text 0 Beam 3]: "cook before working in restaurants in London, including"
@byshiue : Thanks for the prompt reply. In the example provided by you, there seems to be very little diversity "among the different beams" of a single prediction. Grouped beam search ensures that the same beams are not picked across group to ensure significant diversity. https://huggingface.co/docs/transformers/v4.18.0/en/main_classes/text_generation#transformers.generation_utils.GenerationMixin.group_beam_search
There is no option to set group width in context of beam search in TRT-LLM.
From the paper, I remember diverse beam search only encourage choose different beams by the penalty. But the diversity is controlled by the penalty and cannot ensure you choose different beams.
We found in one of the issues that TRTLLM supports Grouped Diverse Beam Search : https://github.com/NVIDIA/TensorRT-LLM/issues/79#issuecomment-1825401751
Yet, we are unable to change the groups and their sizes. We tried looking into the code for options but are unable to find it. Setting the beam_diversity_rate to the value of 2.0 doesn't lead to any perceivable increase in variance of the output.
Example:
Leads to the output:
and having the sampling config as ::
leads to the output:
Kindly guide us on how can we achieve greater diversity in our output. Thanks!