-
压缩好的 bert mrpc模型,使用paddle_inference_eval验证性能,发现int8和fp32精度差距很大
--precision=fp32 84
--precision=fp16 84
--precision=int8 61
oydf updated
7 months ago
-
Just a quick question. I want my final model to be full int8 instead of float32 for input and outputs. I want the training to be as accurate as possible. Do I train with quantised input and outputs? B…
-
### Search before asking
- [X] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussions) and fou…
-
Add support to save and load pre compiled MIGraphX graph to From the MIGraphX EP to speed up time to inference.
- [x] Save our precompiled graphs
- [x] Load in precompiled graphs
We do a similar so…
-
### 请提出你的问题 Please ask your question
Paddle_inference int8 trt推理占的显存比fp32 trt推理占的要多能解释下吗!
Paddle_inference int8 trt推理占的显存比fp32 trt推理占的要多能解释下吗!
Paddle_inference int8 trt推理占的显存比fp32 trt推理占的要多能解释下吗!
-
### Describe the bug
I am still facing the issue described in #1331 . However, I am directly using torch.onnx.export on loaded PatchcoreModel model. As a result, self.memory_bank is not being initial…
-
Where can I download bloom-7b?
I noticed that int8 quantization is available, but is there an option for int4 quantization?
What is the memory overhead for int4 and int8 when using LoRA or PTuning f…
-
Hello,
I hope this message finds you well. I followed the tutorial to successfully convert the model; however, an error occurred during the model conversion process. I am seeking clarification on t…
-
Hello guys, I wrote a streaming inference pipeline in Python for this project, including torch jit script, int8 dynamic quantization, and streaming interface for the audio encoder and decoder (style v…
-
### Feature request
It would be immensely useful to have a server-application to serve up HF-Transformer and other Hub models as a service, similar to the how `llama.cpp` bundles the `llama-server`…