-
When I try to generate audio using batch generation on long text, it drops some words/sentences in audio. like they are never there.
the code I am using is given below
```
def sentense_tokeni…
-
Hugging Face `transformers` is moving to use the `DynamicCache` class as part of the model inputs for the kv cache values. Currently `torch.export` will complain that it is not a tensor. So all models…
-
Using the v0.8 version of [ChatQnA example](https://github.com/opea-project/GenAIExamples/blob/v0.8/ChatQnA/docker/gaudi/compose.yaml), the tgi service fails with heath test.
Environment:
- OS: ub…
-
## Expected Behavior
This is my input.csv file:
```
id,sequence
heterodimer_2,MAAEAWRSRFRERVVEAAERWESVGESLATALTHLKSPMHAGDEEEAAAARTRIQLAMGELVDASRNLASAMSLMKVAELLALHGGSVNPSTHLGEISLLGDQYLAERNAGIKLLEAG…
Nuta0 updated
3 weeks ago
-
Hello, and thank you for pushing the boundary on speculative generation!
Question 1: In Eagle-1 paper, table 7 reports throughput figures for Vicuna-7B at 1.97x. How exactly was this measured? (GPU…
-
### What happened?
I am using the LiteLLM proxy, and also using the `litellm` python SDK to interact with the proxy.
In my `config.yaml`, I have `langfuse` configered
```yaml
... some other config…
-
### System Info
On main
### Who can help?
@zucchini-nlp @gante
### Information
- [ ] The official example scripts
- [ ] My own modified scripts
### Tasks
- [ ] An officially su…
-
### Checks
- [X] This template is only for feature request.
- [X] I have thoroughly reviewed the project documentation but couldn't find any relevant information that meets my needs.
- [X] I have sea…
-
### System Info
Hi!
I'm running speculative execution TRT-LLM engine with 4 or 5 generation length, and I noticed that fp8 kv cache attention works slower than fp16 kv cache attention. Would be grea…
-
First of all, I want to commend you on the great work with the nodes! I have a suggestion for a new functionality that I believe would be relatively easy to implement.
Currently, the prompt batch n…