-
When I quantified the Qwen2.5-1.5B-instruct model according to **"Quantizing the GGUF with AWQ Scale"** of [docs](https://qwen.readthedocs.io/en/latest/quantization/llama.cpp.html) , it showed that th…
-
### Discussed in https://github.com/langchain-ai/langchain/discussions/27404
Originally posted by **kodychik** October 16, 2024
### Checked
- [X] I searched existing ideas and did not find …
-
A continuation from task #15. Should include an in-depth description of the technology behind the LLMs and of the training and inference. Finish the section
This issue should neatly be tied together …
-
**Is your feature request related to a problem? Please describe.**
The current deployed version of instructlab requires llama_cpp version 0.2.79, and I want to run the new IBM Granite architecture, w…
-
I would like support the following architectures:
- Mamba
- MambaByte
- Mamba-2
- Mamba-hybrid (mamba + transformer)
- Mamba-2-hybrid (mamba2 + transformer)
These architectures are becoming qu…
-
When I quantified the Qwen2.5-1.5B-instruct model according to "GGUF Export" in the examples.md in the docs, it showed that the quantization was complete and I obtained the gguf model.But when I load …
-
### Name and Version
```
.\llama-cli.exe --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA…
-
More details here: https://docs.google.com/document/d/11_6pvPzd956QONIxHuDP155eBRrd89xSC1tYVRy3KvI/edit#heading=h.z0eti03fxfmv
-
### Name and Version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: yes
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capab…
-
- Aggregated measures:
- Difficult to aggregate measures of individual dimensions into a single index
- Directly ask for an aggregate measure from LLM? It is not transparent and difficult to pr…