-
First I want to say THANK YOU to make this project possible. It's amazing how many possibilities will open thanks to this community :)
I want to run llama2 on my iPhone, however most of the iPhones…
-
what was the quantisation algorithm used in unsloth/Llama-3.2-1B-bnb-4bit model: https://huggingface.co/docs/transformers/main/en/quantization/overview. Is it int4_awq or int4_weightonly ?
-
### System Info
Transformers.js 3.0.1
running in node 18 using CommonJS
### Environment/Platform
- [ ] Website/web-app
- [ ] Browser extension
- [X] Server-side (e.g., Node.js, Deno, Bun)
- [ ] De…
-
Currently, there is no "unknown" option for quantization from the OpenRouter provider, so models like mistralai/mixtral-8x7b without known quantization do not work
-
Would it be possible to support 8bit quantization?
-
### Checklist
- [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.…
-
### Description of the bug:
I tried running the example.py script given for quantization example, but for Llama. Wherever the reference to Gemma was made, I made appropriate references to Llama. The…
-
Does Minicpmv2.6 currently support int8/fp8 quantization?
thanks~
-
## Description
trt10.5 pytorch-quantization has compile bug.
https://github.com/NVIDIA/TensorRT/blob/release/10.5/tools/pytorch-quantization/src/tensor_quant_gpu.cu#L28-L37
define two macro `AT_DI…
-
We want to support the ability to run a full fine-tune with just 8 bit quantization.