-
### System Info
Hi,
I'm having trouble reproducing NVidia claimed numbers in the table here: https://nvidia.github.io/TensorRT-LLM/performance/perf-overview.html#throughput-measurements
System Im…
-
Is there an easy way to convert gguf to marlin and vice-versa? Any comparisons?
https://github.com/leafspark/AutoGGUF
blap updated
2 months ago
-
my system:
```
| ~/Downloads : $ cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.1 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
…
-
**Installation command (conda environment):**
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python
**produces the following output:**
Collecting llama-cpp-python
Usi…
-
I'm using :
- MacOS Ventura 13.2.1
- MacBook Air M1
When I execute the command :
```python setup_env.py --hf-repo HF1BitLLM/Llama3-8B-1.58-100B-tokens -q i2_s```
I got the message:
```
INFO:root:C…
-
llama.cpp has an override-kv option that can be used to override, well, model kv values. This can be useful with the myriad of existing ggufs that don't have a pretokenizer specified. It would be nice…
-
**LocalAI version:**
using docker image: latest-aio-gpu-nvidia-cuda-12
**Environment, CPU architecture, OS, and Version:**
Linux 0fe2bf31da79 5.15.133.1-microsoft-standard-WSL2 #1 SMP Thu…
-
# Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [Y] I am running the latest code. Development is very rapid so there are no tagged versions as of…
-
Zamba2 is a really cool model that uses a hybrid Mamba-Transformer system.
https://huggingface.co/Zyphra/Zamba2-2.7B
https://www.zyphra.com/post/zamba2-small
I have been wanting to use this for a…
-
Hi, I am testing out various slm for possible use alongside automation code for civil engineering design.
That is my initial task, but I am generally interested in various applications that slm can b…