-
**Describe the bug**
Following the instructions in [`examples/mistral`](https://github.com/microsoft/Olive/tree/main/examples/mistral) does not result in a quantized onnx model. After running the wor…
-
1 when will you provide pip package?
2 automatically backend change for each layer, as I know some backends have specific requirement, for example, bits, channel number
3 will you support layer …
-
* Four months ago (November, 2017), Visual Basic MVP Klaus Löffelmann (@KlausLoeffelmann) was invited to attend the VB LDM to present on the challenges of modern GUI programming. Klaus presented a few…
-
Status: Draft
Updated: 09/18/2024
# Objective
In this doc we’ll talk about how different optimization techniques are structured in torchao and how to contribute to torchao.
# torchao Stack Ove…
-
Current testing PR #87 and running into very slow quants for a Tinyllama 1.1B test model.
I am geting ~96s per layer in quantization on 4090 gpu with n_blocks = 1 and ~75s per layer with n_blocks…
-
While testing for OPT with `quant_lm_head=True`, here are the result weights post quantize:
`weight keys: ['lm_head.g_idx', 'lm_head.qweight', 'lm_head.qzeros', 'lm_head.scales', 'model.decoder.em…
-
There are different sym implementations, one is gptq/autoround way, the other is awq/our-rtn way.
-
-
The README mentions FP4. Does it support FP4 inference? It looks like there is no implementation of FP4 inference in the current code.
If it is supported, is there a tutorial for using it(including q…
-
How much GPU memory and CPU size are required when I quantize the Chatglm3-6B model,I used A100-40G,but get an error ”killed“。