-
### What is the issue?
I'm trying to get the project to compile on Gentoo but am running into some issues as Gentoo uses different paths.
On Gentoo, rocm libraries get installed into /usr/lib64, h…
-
### System Info
- CPU x86_64
- GPU: L40
- tensorrt_llm: 0.11.0
- CUDA: 12.4
- driver: 535.129.03
- OS: CentOS 7
### Who can help?
When I tried to import tensorrt_llm, it got stuck. Through debuggi…
-
I'm trying to run the TensorRT version of the docker container according to instructions, but am getting a segfault whenever I attempt to transcribe any audio. The same audio works with the Faster whi…
-
### What happened?
I am trying to run Qwen2-57B-A14B-instruct, and I used llama-gguf-split to merge the gguf files from [Qwen/Qwen2-57B-A14B-Instruct-GGUF](https://huggingface.co/Qwen/Qwen2-57B-A14B-…
-
### What happened?
If creating a llama model in python code, you can specific n_gpu_layers=-1 so that all layers are offloaded to GPU. (see below example) When starting llama cpp server using the doc…
-
### What happened?
After building the SYCL server image, trying to load a model larger than Q4 on my Arc A770 fails with a memory error.
Anything below Q4 will execute, but this is due to the "llm_l…
-
###
I compiled the vllm0.5.4 using the CPU, which does not support AVX512. After compiling, I entered the container and executed the command to start the llama3-8b model.
```text
python3 -m…
-
Hello, how to run mlc llm on wsl2 using cpu?
I was try mlc_llmchatHF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
But get error.
Please provide me with a command that I can copy the error text with my mo…
-
Hi, I am having an issue with running the sample example in the [quickstart guide](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md#3-example-runnin…
-
### Your current environment
Hello,
I'm trying to download llama3.1-8B-Instruct to my PC and each time i try, i get the following error:
```bash
[rank0]: torch.OutOfMemoryError: CUDA out of memo…