-
### Check for existing issues
- [X] Completed
### Describe the feature
After going through: https://zed.dev/docs/completions
Zed currently supports completions via external LLM APIs like GitHub …
-
We really on OpenAI API calls in two key areas:
1) To generate the distributions we create a prompt and then send it to the API for a response. Ideally we could just swap out this API call for a ca…
-
After installing the dependencies (adjusted per #5 and #8), and attempting to run an example such as the one from the readme, it fails on the first line when `llama_cpp` isn't installed.
-
### Summary
- Provide k-quant models
- Maintain existing gguf models
- Embedding models
- [x] [second-state/Nomic-embed-text-v1.5-Embedding-GGUF](https://huggingface.co/second-state/Nomic-…
-
I was trying to find where to set which quantisation to use for the K/V context cache and it seems you can't in LM Studio.
K/V cache quantisation is required to run models context efficiently by re…
-
My initial testing comparing ct2 (using int8) and the ```bitsandbytes``` library at 4 and 8 bit...nicely done ctranslate2 people. Looking forward to testing GGUF in there as well.
![image](https:/…
-
First of all, when pp_size = 1, everything is good with tp_size = 1,2,4,8.
My test on pipeline parallelism (pp_size > 1) always failed with different error in the last few rows in this post.
Firs…
-
I found the crash occur on some low level API(
-
### System Info
GPU 2* A30, TRT-LLM branch main, commid id: 66ef1df492f7bc9c8eeb01d7e14db01838e3f0bd
### Who can help?
_No response_
### Information
- [x] The official example scripts
- [ ] …
-
### Prerequisites
- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.…