-
### Before Asking
- [X] I have read the [README](https://github.com/meituan/YOLOv6/blob/main/README.md) carefully. 我已经仔细阅读了README上的操作指引。
- [X] I want to train my custom dataset, and I have read the …
-
We plan to add QAT for LLMs to torchao (as mentioned in the original RFC here https://github.com/pytorch-labs/ao/issues/47)
For this to run efficiently on the GPU we'd need kernel support for W4A8…
-
Hi,
I have tried running the **CohereForAI/aya-expanse-8b** model. I added the following code to your script
---------------------------------CODE CHANGE 1--------------------------------------…
-
May I ask whether the current project supports INT8 quantization? If so, how? Currently onlyFT16, FT32 quantification is supported, right?
-
as titled cc @tridao @jayhshah
-
Currently some quantized huggingface models save zero-points in int4 datatype directly, like [Qwen/Qwen2-7B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-7B-Instruct-GPTQ-Int4) and [Qwen/Qwen2…
-
### Question
I did some testing with 4-bit and 8-bit quantization and it doesn't seem to improve inference time at all - in fact, it seems to make it worse. All I did was simply set `load_in_8bit` or…
-
Hi all, thanks a lot for the nice work introducing Vicuna and FastChat.
I am a beginner in NLP (so correct me if I am wrong) and use GPUs with limited memories, so I would like to train/infer with …
-
### 1. System information
- OS Platform and Distribution : Ubuntu 22.04.3 LTS
- TensorFlow installation: pip install tensorflow (virtual env: venv)
- TensorFlow library: pip package -> tensorflow…
-
When compressing a GLTF model with [gltfpack](https://meshoptimizer.org/gltf/) it appears with incorrect scales in game.
Model without quantization:
![image](https://user-images.githubusercontent.…