-
no_gt retrieval metrics needs large amount of LLM processing.
So, use local LLM model to compute it.
+ ragas context precision need so much LLM calls. So, try to use tonic validate instead.
-
I don't understand to set the chat_llm to ollama, if there is no preparation for utility_llm and/or embedding_llm to set it to local (ollama) pendants. Yes, I assume that prompting will be a challenge…
-
### Describe the bug
Function __post_carryover_processing(chat_info: Dict[str, Any]) of chat.py in agentchat folder throw the above exception when running Google Gemini.
The cause of the problem w…
-
### 🚀 The feature, motivation and pitch
Is the deepseek-v2 AWQ version supported now? When I run it, I get the following error:
```
[rank0]: File "/usr/local/lib/python3.9/dist-packages/vllm/mo…
-
can ollama URL be configured to point to remote box?
or try use ssh tunnel to make remote ollama appear to be local
-
**Describe the bug**
When running `chat-with-wikipedia` flow, I'm encountering the following error:
```
...
File "/usr/local/lib/python3.9/site-packages/promptflow/tracing/_trace.py", line 145, in
…
-
- [ ] [paper-qa/README.md at main · Future-House/paper-qa](https://github.com/Future-House/paper-qa/blob/main/README.md?plain=1)
# PaperQA2
[![GitHub](https://img.shields.io/badge/github-%23121011.s…
-
## 🐛 Bug
Do not running Llama-3-8B-Instruct-q4f16_1-MLC
## To Reproduce
Steps to reproduce the behavior:
1. conda create --name mlc-prebuilt python=3.11
2. conda activate mlc-prebuilt
3…
-
### System Info
ubuntu 20.04
tensorrt 10.0.1
tensorrt-cu12 10.0.1
tensorrt-cu12-bindings 10.0.1
tensorrt-cu12-libs 10.0.1
tensorrt-llm 0.10.…
-
### System Info
I am experimenting with TRT LLM and `flan-t5` models. My simple goal is to build engines with different configurations and tensor parallelism, then review performance. Have a DGX syst…