-
### What happened?
After building the SYCL server image, trying to load a model larger than Q4 on my Arc A770 fails with a memory error.
Anything below Q4 will execute, but this is due to the "llm_l…
-
this can be used as inspiration https://github.com/r4fek/django-cassandra-engine/pull/66
-
The command I run:
'''
python llama3.py --pruning_ratio 0.25 \
--device cuda --eval_device cuda \
--base_model home/Meta-Llama-3-8B \
--block_wi…
-
### Describe the issue
I'm using open ai whisper model with onnxruntime.
And when running with directml execution provider and medium model it failed with error
```console
2024-08-21 00:45:47.…
-
### What happened?
I wanted to use the Kompute version to run on my GPU (Radeon RX570 4G) but whenever i use the `-ngl` argument to offload to GPU, `llama-cli` silently exits before loading the model…
-
### What happened?
```
You are a helpful assistant
> what is 2+2+2+2
44444444444444444444444444444444444444444444444444444444444444444444444444444444444444444
>
```
When I run llama-cli with…
-
### Version
22.10
### Platform
```text
Linux be964f1f5acb 6.10.4-linuxkit #1 SMP PREEMPT_DYNAMIC Wed Oct 2 16:39:54 UTC 2024 x86_64 Linux
https://hub.docker.com/layers/library/node/22-alpine/im…
-
# Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [ Yes] I am running the latest code. Development is very rapid so there are no tagged versions as…
-
The efitools [1] recipe seems to be a good candidate for the oe-core or meta-oe layer as it is a dependency of more and more bsp's to implement uefi secure boot.
[1] https://github.com/Wind-River/m…
-
### What happened?
I've already quantized a 2b variant of this model, and one of its instruct fine tune, on a subset of the same data (the first 1000 samples are the same in the same order -- the e…