-
### Command:
**llama stack run Llama3.2-11B-Vision-Instruct --port 5000**
**Output:**
```
Using config `/Users/mac/.llama/builds/conda/Llama3.2-11B-Vision-Instruct-run.yaml`
Resolved 4 prov…
-
### 📚 The doc issue
https://github.com/pytorch/executorch/tree/main/examples/qualcomm
- On the 3) qaihub_scripts, it still mentioned llama2. Can we update those to reflect the latest support for l…
-
### What is the issue?
when I use llama3.2-vision:90b ,the model always response so slow. how can I do?
And the GPU is not fully used. CPU used very high.
it's run on V100*4 = 64G GPU
Is anyone ca…
-
I get more than 80,000 words when I use the Ollama Vision node, unbelievable!!! The Ollama model I use is llama3.2-vision:11b, I am not sure if that model's problem or others.
This is quite likely to…
-
### **Description**
When attempting to use the `llm` command-line interface (CLI) on Windows, I encounter an encoding error related to the ASCII codec. Despite setting environment variables to enfo…
-
### System Info
- GPU: H100
- Triton Server with Tensor rt Backend (v.0.10.0)
- Launched on K8s. Docker Container built using [tensor rt builder](https://github.com/triton-inference-server/tensorrt…
-
**Describe the bug**
Running the TG-Llama-70b MLP test in a loop leads to a hang after a variable number of iterations - typically hundreds.
This is one example of the watcher output after the hang.…
-
### Which API Provider are you using?
Ollama
### Which Model are you using?
deepseek-coder-v2:latest
### What happened?
### Steps to reproduce
Write simple code for crartin…
-
![image](https://github.com/user-attachments/assets/bcdc2387-eb0a-4aca-a4c4-a07d755a8bac)
-
Change LLM from Llama 2 to Llama3