-
In the docs, it specifies that length_penalty is only for beam search - that means that in multinomial sampling, length_penalty does not change the generation. https://github.com/openvinotoolkit/openv…
-
If run this code:
```python
model = Qwen2VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2-VL-7B-Instruct",
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
…
-
### 🚀 The feature, motivation and pitch
Now vllm cannot support diverse beam search which transformers already supports(https://huggingface.co/docs/transformers/generation_strategies#diverse-beam-sea…
daiai updated
3 weeks ago
-
### System Info
```
- `transformers` version: 4.45.0.dev0
- Platform: Linux-6.8.0-40-generic-x86_64-with-glibc2.35
- Python version: 3.10.13
- Huggingface_hub version: 0.24.7
- Safetensors versi…
-
Thansk for your great work,
how ever I have a question about the use of beam search
During the inference , the generative model M sequentially generates textual outputs y, which consist of multiple …
-
### Describe the issue
we currently use large max_length in beam search, but we got max_length
-
Hi OpenVINO team,
I recently submitted another issue #1150, but I’d also like to suggest another enhancement for WhisperPipeline.
Currently, WhisperPipeline(openvino_genai-2024.5.0.0rc1) relies …
-
can you add beam search to your code ?
https://towardsdatascience.com/temperature-scaling-and-beam-search-text-generation-in-llms-for-the-ml-adjacent-21212cc5dddb#e148
https://github.com/mikecvet/be…
-
The current implementation of the `get_beam_search_score` method in the vllm/sequence.py seems to incorrectly include the prompt length in the sequence length when calculating the beam score. This dev…
-
Hi!
I am trying to use transformer-neuronx to compile the customized huggingface llama-3.1-8b model.
I use the model with beam search, and I know that it makes dynamic graph during generation.
Bu…