-
### Type of issue
Typo
### Feedback
We have developed an app powered by an LLM and want to integrate it as a plugin in Microsoft Copilot. While it works with the command box, we're facing a limitat…
-
### Platforms
all
### Description
When you give Leo a YouTube video to give a summary, it gives a long string of words in a single line. In my opinion it would be more clear to the LLM if they wher…
-
### System Info
- TensorRT-LLM version: 0.11.0
- Python Version: CPython 3.12.3
- Operating System: Linux 6.8.0-1012-aws
- CPU Architecture: x86_64
- Driver Version: 560.35
- CUDA Version: 12.6
### …
-
### Summary of problem
We're using celery + gevent pool (with 500 workers) for LLM batch processing. And we're missing 4/5 of llm calls on Datadog.
Continued from #10212 as I can't open issu…
-
Hi there,
Thank you for bringing the elegant RAG Assessment framework to the community.
I am an AI engineer from Alibaba Cloud, and our team has been fine-tuning LLM-as-a-Judge models based on t…
-
Hey, looking a good initiative.
I have locally downloaded local LLms, can't those be used with this project? Why do I need API Keys if I don't want to use those platforms.
I have LM Studio as w…
-
### Description
I am currently using this only plugin from several LLM plugins, due to its excellent compatibility and ease of use. Thank you.
Is it ok to open up the ability for users to edit the…
-
-
- Description:
- The autoregressive decoding mode of LLM determines that LLM can only be decoded serially, which limits its inference speed. Speculative decoding technique can be used to decode L…
-
# What I ran:
```sh
python -m sharktank.examples.export_paged_llm_v1 --gguf-file=/tmp/open_llama_3b_v2/open-llama-3b-v2-f16.gguf --output-mlir=/tmp/open_llama_3b_v2/open-llama-3b-v2-f16.mlir …