-
1. How many LLMs are needed for `setting`? In your paper [PaperQA: Retrieval-Augmented Generative Agent for Scientific Research](https://arxiv.org/pdf/2312.07559.pdf), this paper seems to have employi…
-
Hello,
I changed batch size from 1 (default) to 8, 32 and saw no changes on paperQA behavioural (answer quality end speed), as follows :
```
settings=Settings(
llm=f"openai/mixtral:8x7b",…
-
Hi,
Could you please add option for code autocompletion, similar to recently added github copilot, but based on local ollama LLM?
Currently vs code and jetbrains have such option with continue a…
-
### Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
### Describe the bug and reproduction steps
When I create a local LLM service with llama.cpp, I have verif…
-
- see also https://github.com/ObrienlabsDev/blog/issues/47
- see https://github.com/ObrienlabsDev/rag/issues/4
-
Environment
• Docker Image: nvcr.io/nvidia/tritonserver:24.10-trtllm-python-py3
• TensorRT-LLM Version: 0.14.0
• Run Command:
python3 ../run.py \
--input_text "你好,请问你叫什么?" \
--max_output_len=…
-
# 🐞 Describe the Bug
Facing an `OutOfResources` error with 64 fine-grained experts and dropless MoE enabled, even though there is sufficient GPU memory.
# 🔄 Steps to Reproduce
Steps to reprod…
-
### Describe the bug
interpreter --local
Open Interpreter supports multiple local model providers.
[?] Select a provider:
> Ollama
Llamafile
LM Studio
Jan
…
-
System config:
- CPU arch x86_64
- GPU: H200
- Tensorrt-LLM:v0.14.0
- OS: ubuntu-22.04
- runtime-env: docker container build from sources via official [build script](https://techcommunity.microsoft.c…
-
System Info
GPU: NVIDIA RTX 4090
TensorRT-LLM 0.13
quest 1: How can I use the OpenAPI to perform inference on a TensorRT engine model?
root@docker-desktop:/llm/tensorrt-llm-0.13.0/examples/apps# pyt…