-
add a section about testing llms, this is crucial
-
### Description
If you download a gguf model and update the LLM URL settings to the proper port where kotaemon is loading the model, testing against the "ollama" LLM works.
However, the Embeddin…
-
Hugginface hub login successful
Used gemma2-27b LLM to testing:
cargo run --release -- -m "google/gemma-2-27b-it" -c
Finished release [optimized] target(s) in 0.03s
Running `target/re…
-
### **Is your feature request related to a problem? Please describe.**
PyRIT currently lacks built-in support for easily using and comparing multiple LLM providers. This makes it challenging for user…
-
details here: https://docs.arize.com/phoenix
RedSAIA project integration: https://gitlab.consulting.redhat.com/redprojectai/infrastructure/appdeploy/-/tree/main/phoenix?ref_type=heads
-
I have developed a new KV cache quantization scheme. I am now interested in testing its performance within TensorRT-LLM.
I'm new to this project, so I am trying to understand the current implementa…
-
### What happened?
I encountered an issue while loading a custom model in llama.cpp after converting it from PyTorch to GGUF format. Although the model was able to run inference successfully in PyTor…
-
### Affected component
llms/ShuttleAIToolModel.py
### Motivation
Our testing indicates changes in the ShuttleAIModel, which have surfaced JSON-related errors:
--
FAILED tests/llms/ShuttleAIModel_t…
-
### System Info
- NVIDIA A100 80G * 2
- Libraries
- TensorRT-LLM: 0.11.0.dev2024052800
- Driver Version: 525.105.17
- CUDA Version: 12.4
### Who can help?
@byshiue @schetlur-nv
##…
-
This reports mistral.rs as being faster than llama.cpp: https://github.com/EricLBuehler/mistral.rs/discussions/612
But I'm seeing much slower speeds for the same prompt/settings.
Mistral.rs
``…