-
I was testing the new quantize by @angeloskath with some Italian prompts that were failing with previous version and now are PERFECT! But while doing this I have seen extreme slowness with q8 and fp16…
-
I was running [Llama-3-8b-Instruct](https://huggingface.co/gaianet/Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf) as gaianet node with my self created [snapshot](https://h…
-
Hi.
I run into this error when trying to fine-tune **Phi3 small**:
```triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 180224, Hardware limit: 101376. Reducing bloc…
-
_Originally from @philschmid [on slack](https://huggingface.slack.com/archives/C02EMARJ65P/p1720010050590199?thread_ts=1719998272.270859&cid=C02EMARJ65P) (private):_
Being openai compatible for ser…
-
Given that we have only Llama 3 70B and 8B, it would be useful to have a Tiny Llama based on the Llama 3 tokenizer so that we can use it as a drafting model for speculative decoding.
Are there pla…
-
- [x] Use `llama_decode` instead of deprecated `llama_eval` in `Llama` class
- [ ] Implement batched inference support for `generate` and `create_completion` methods in `Llama` class
- [ ] Add suppo…
-
When attempting to run the training script for LLaMA with the following command:
`CONFIG_FILE="./train_configs/llama3_8b.toml" ./run_llama_train.sh`
an ImportError is encountered. The specific error…
-
Hi guys, I am using Mistral 77b-instruct model with llama-index and load the model using llamacpp, and when I am trying to run multi prompts ( open 2 website and send 2 prompts) , and it give me this …
-
All I need is to run ollama3 on an Intel GPU (Arc™ A750) and I follow the steps as described in the IPEX-LLM documentation, but it runs on the CPU. Search engines can't find a solution to the problem.…
-
Hi,
I am running Windows 11, Python 3.11.9, and comfyui in a venv environment.
I tried installing the latest llama-cpp-python for Cuda 1.24 in the below manner and received a string of errors. Can a…