-
Dear BigCode team, what a wonderful project!
I am writing this feature request for official implementation of GGUF quantization for Starcoder2 to enhance its adoption with coding platforms and APIs…
-
### Description
Having copy-on-write segments lends itself nicely with quantization. I propose we add a new "scalar" or "linear" quantization codec. This will be a simple quantization codec provided …
-
## 🌱 Describe your Feature Request
I am requesting the incorporation of a BitNet layer in CoreML, similar to the PyTorch implementation by Kyegomez (https://github.com/kyegomez/BitNet). A BitNet l…
-
Now, paddle ERNIE fp32 inference on CPU performance is ass below:
single thread: 251.464 m
20 threads:29.8818 ms
Our goal is to prove that with INT8 real kernel, ERNIE can get the performance gain.…
-
### What is the issue?
I carefully read the contents of the readme's documentation to try and found that something went wrong
time=2024-05-20T10:06:02.688+08:00 level=INFO source=server.go:320 msg…
-
Run
```
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -U mlx-lm
python3 -c "import mlx_lm; print(mlx_lm.__version__)"
MODEL=mlx-community/gemma-2-27b…
-
![image](https://github.com/unslothai/unsloth/assets/1203957/969e6356-6e32-494e-9dc5-7cef6b261a6d)
[/usr/local/lib/python3.10/dist-packages/unsloth/save.py](https://localhost:8080/#) in save_to_ggu…
-
at @onefact we have been using wasm, but this won't work for the encoder-only or encoder-decoder models i've built (e.g. http://arxiv.org/abs/1904.05342). that's because the wasm vm is for the cpu (ha…
-
Dear all,
I have noticed that the quantised weight of QLinear module are QTensors with a scale parameter of dimension out_features. Should it not be a scalar value in the case of linear modules (p…
-
### System Info
- `transformers` version: 4.36.0
- Platform: Linux-5.15.0-1041-aws-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.20.3
- Safetensors version: 0.4.1…