-
LLM sometimes uses AI commentary such as "this article appears to use...", "\*\*section\*\*", and "overall the articles covers...".
## version info
- using commit `1f95a163a4310390271fea35f82e15a6…
-
在llm decoder-only架构下,tokenizer.padding_side是否都应该设置为left
-
**Research Question** What is the best way to connect LLM outputs to a reactive UI?
Potential solutions to research, explore and experiment:
- Application state control
- JSON schema to update UI…
-
### System Info
GPU: A100
Ubuntu: Ubuntu 22.04.4 LTS
Command:
```
CONVERT_CHKPT_SCRIPT=/opt/tritonserver/TensorRT_LLM_KARI/TensorRT-LLM/examples/llama/convert_checkpoint.py
python3 ${CONVERT_CHKPT_…
-
I followed the exact instructions provided by TensorRT-LLM to setup triton-llm server for whisper
I am stuck with the following error when i try to build the TRT:
```
[TensorRT-LLM] TensorRT-LLM ve…
-
[version](https://github.com/NVIDIA/TensorRT-LLM/tree/31ac30e928a2db795799fdcab6be446bfa3a3998)
When I build model with paged_context_fmha = true and max_num_tokens = 4096, chunked context is enabled…
-
I would suggest the `Ollama` api as that is well documented and supports many llms.
-
It would be nice to be able to log all chat history easily :)
-
Currently we only have 1 prompt for all LLMs. In the future we might want LLM specific prompts following each one's best practices.
The way to fix this is to have a file with a list/dict of configura…
-
We’re so happy to have you on board with the LADy project, Calder! We use the issue pages for many purposes, but we really enjoy noting good articles and our findings on every aspect of the project.
…