-
why there is no extended experiments on LLMs or large vision transformers?
-
The model downloaded from https://github.com/fatihcakirs/mobile_models/blob/main/v0_7/tflite/mobilebert_int8_384_20200602.tflite
Some Fully-connected weights has none-zero zero point (ex. weight `b…
-
Hello all,
I am working on trying some logic on top of QAT where i will make few of the layers during QAT non-trainable based on some logic. I am currently seeing there is no such support in QAT (a…
-
# PaddleSlim量化
![image](https://user-images.githubusercontent.com/1312389/170643197-8a42af2b-b696-4363-ac3a-29a582642162.png)
PaddleSlim主要包含三种量化方法:量化训练(Quant Aware Training, QAT)、动态离线量化(Post Train…
-
when I run quantization_speedup.py in /examples/tutorials, get erros like this:
```
Traceback (most recent call last):
File "quantization_speedup.py", line 114, in
engine.compress()
Fi…
-
Originally I posted this bug [#54753](https://github.com/tensorflow/tensorflow/issues/54753) on [tensorflow/tensorflow](https://github.com/tensorflow/tensorflow/issues) and was advised to repost it he…
-
I have only P100 and V100 which dosen't support INT8. So what should I do to quantize BERT to FP16 ?
Thanks in advance!
-
**Describe the bug**
Hi, I use openvino EP to test QDQ model performance but find QDQ model's performance is worse than original fp32 model.\
**System information**
- ONNX Runtime installed fro…
-
Excellent work!
Can use the CPU in the inference state?
And how much faster than baseline?
-
**Describe the bug**
I use huggingface transformers albert model albert-base-v2 to classify text,meanwhile,I use onnxruntime to optimized and quantized,
`opt_model = optimizer.optimize_model(
…