-
I am geeting the following for Llama-LLM
```bash
2024-06-28 21:57:20 INFO openai - message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=15 request_id=req_5533…
-
### 🐛 Describe the bug
512M parameters
Mostly vanilla LM transformer. FlashAttention 2.4.2, PyTorch 2.2.0. Uses both FA and FlashRotary.
Dtype: bf16
Nvidia A40. single-GPU
Unfused: 85 TFLOPS
F…
ad8e updated
7 months ago
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
WARNING 11-05 06:10:50 _custom_ops.py:19] Failed to import from vllm._C with Mo…
-
## 🚀 Feature
Restructure the function `multi_head_attention_forward` in [nn.functional](https://github.com/pytorch/pytorch/blob/23b2fba79a6d2baadbb528b58ce6adb0ea929976/torch/nn/functional.py#L357…
-
### What is the issue?
Scene One
By calling a public cloud-based LLM model through an AI Agent, two documents exceeding 2000 words each are uploaded, and the input question is: Analyze the differe…
-
### What is the issue?
I am using open webUI version v0.3.30 and when I try to analyze an image using the llama3.2-vision:latest model I get nothing.
In the ollama service log I see the following:
…
-
### Your current environment
```text
The output of `python collect_env.py`
```
### 🐛 Describe the bug
### On the Tesla T4 the model "hangs" after loading the model (the vram usage spikes normal…
-
### Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related iss…
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch…
-
docker container version : ipex-llm-serving-xpu:2.2.0-b2
start shell script:
model="/llm/models/Qwen/Qwen2.5-32B-Instruct-AWQ"
served_model_name="Qwen2.5-32B-Instruct-AWQ"
export CCL_WORKER_…