-
Not sure what I'm doing wrong but this is the error message I get when trying to run pqa ask:
NotFoundError: litellm.NotFoundError: OpenAIException - Error code: 404 - {'error': {'message': 'The m…
-
### System Info
- Compute instance: AWS G6.48xlarge (https://aws.amazon.com/ec2/instance-types/g6/)
- Driver Version: 535.183.01
- Working inside `nvcr.io/nvidia/tritonserver:24.07-trtllm-python…
-
### Your current environment
driver 1.17
vllm 0.5.3.post1+gaudi117
```text
export VLLM_GRAPH_RESERVED_MEM=0.1
export VLLM_GRAPH_PROMPT_RATIO=0.9
export VLLM_PROMPT_S…
-
### What happened?
Hi, im trying to use Google [Madlad400 in GGUF version,](https://huggingface.co/NikolayKozloff/madlad400-10b-mt-Q8_0-GGUF) but I'm unable to work it with `llama-server` but it work…
-
### What happened?
During a refactor, it looks like:
1. OIDC IAM caching was completely lost. Because of this, on *every* single Bedrock call, LiteLLM is requesting a new token. This is extremely …
-
https://twitter.com/omarsar0/status/1641792530667675648/photo/1
-
### Describe your problem
**Output**
--
![image](https://github.com/user-attachments/assets/ca7f96a4-a5b5-4495-b399-01790c1e83ba)
### Data in .csv file
![image](https://github.com/user-attachme…
-
### What happened?
imatrix creation and subsequent quantization to IQ3_XXS of [mixtral 8x7b instruct](https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF/blob/main/mixtral-8x7b-instruct-v…
-
This issue was inspired by #25 through considering testing limitations. This issue proposes the use of static models like the [Llama](https://huggingface.co/meta-llama) and many others which could be …
-
## Background
When implementing a feature, I've noticed that as I get deeper into the task, the LLM starts to struggle with solving issues. Each iteration doesn't necessarily contribute to solving th…