-
### Feature request
It would be immensely useful to have a server-application to serve up HF-Transformer and other Hub models as a service, similar to the how `llama.cpp` bundles the `llama-server`…
-
This is taking about 2 hours with the smallest model.
I presume the issue is that my GPU cannot load a t5_XXL model into memory. According to the Huggingface page the model weights are 44.5 Gb.
…
-
### Describe the issue
use shape_inference.quant_pre_process to preprocess will result in error even if i set skip_optimization=True
![image](https://github.com/microsoft/onnxruntime/assets/12644192…
-
Failer op in [U-2-Net_vaiq_int8.default.onnx.torch.elide.mlir](https://gist.github.com/AmosLewis/0b2daadbf68b26c6f7554318a0dab847#file-u-2-net_vaiq_int8-default-onnx-torch-elide-mlir)
- https://git…
-
Hi, I see multiple quicsr TFLite models available in the project. The default one is 540p which on GPU, currently takes ~200ms on my phone. Is there a faster one? Any of these maybe for 360p?
![ima…
-
I observed that nv_full INT8 inference on VP is taking more time than with FP16 inference.
NVDLA HW branch: nvdlav1, config: nv_full
NVDLA SW branch: Latest with INT8 option in nvdla_compiler
Ple…
-
- 系统环境:
- Paddle版本:1.5.1,CPU,无使用其他加速模块
- 系统: CentOS 6.3
- 问题描述:
- 使用paddle.fluid.contrib.slim.Compressor模块进行模型压缩
- 压缩后的模型,float能正常运行,int8版出现以下错误:
![image](https://user-images.g…
-
### 问题确认 Search before asking
- [X] 我已经搜索过问题,但是没有找到解答。I have searched the question and found no related answer.
### 请提出你的问题 Please ask your question
在Jetson Xavier NX上使用Paddle Inference部署PaddleSli…
-
# Summary
* We (engineering at @neuralmagic) are working on support for int8 quantized activations.
* This RFC is proposing an _incremental_ approach to quantization, where the initial support for q…
-
I have followed the instruction provided by @fsx950223 to create a int8 quantized tflite model. The quantization was for weights and layers output. The tflite obtained from a efficientdet-d2 checkpoin…