-
### What behavior of the library made you think about the improvement?
The current structured generation code is creating a `-inf` copy of the logits array and setting the allowed token ID indices …
-
### Motivation
There's some great projects out there that modify logits, mostly for guided decoding or novel sampling techniques. Supporting every single one of them will cause too much bloat and d…
-
### What behavior of the library made you think about the improvement?
If the model generates bad results we currently have no idea why. E.g.
- https://github.com/outlines-dev/outlines/issues/612#…
-
### Have you searched for similar requests?
Yes
### Is your feature request related to a problem? If so, please describe.
_No response_
### Describe the solution you'd like
Allow the user to name…
-
### What happened?
When using the llama.cpp server with cache_prompt enabled, I've encountered an issue where the logit_bias specified in one request persists and influences subsequent requests, ev…
-
I am running llama3 model on an rtx4090 with fp8 quantization. In the [result](https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/include/tensorrt_llm/executor/executor.h#L323), `outputTokenIds` see…
-
I see that when run streaming inference, the [result](https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/include/tensorrt_llm/executor/executor.h#L323) contains `generationLogits` for the full seque…
-
Hello, thank you for this great work.
https://github.com/linkedin/Liger-Kernel/blob/acd82728207ebafad28d448640502c108901a967/src/liger_kernel/ops/fused_linear_cross_entropy.py#L69
https://github.c…
-
I'm writing because my co-authors and I are using the cem package in R, attempting to fit a factor variable treatment with 5 categories to a dichotomous dependent variable. (Thank you, by the way, for…
-
Using a grammar to influence the logits of a model is becoming a useful technique
- Is this possible with ollama? seems like it ought to be
- Can we get an example? I'm interested to do so, but so…