int8-inference Search Results

1000+ results
for int8-inference

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huggingface/transformers #32478

[ I built it! ] Server application with on-the-fly quantizat…

### Feature request It would be immensely useful to have a server-application to serve up HF-Transformer and other Hub models as a service, similar to the how `llama.cpp` bundles the `llama-server`…

abgulati updated 1 month ago
1
feizc/FluxMusic #13

Takes an excessive amount of time

This is taking about 2 hours with the smallest model. I presume the issue is that my GPU cannot load a t5_XXL model into memory. According to the Huggingface page the model weights are 44.5 Gb. …

AndrewToomey updated 5 days ago
4
microsoft/onnxruntime #20867

Error in quantize vicuna-7b model from fp16 to int8

### Describe the issue use shape_inference.quant_pre_process to preprocess will result in error even if i set skip_optimization=True ![image](https://github.com/microsoft/onnxruntime/assets/12644192…

JackWeiw updated 2 months ago
5
nod-ai/SHARK-TestSuite #188

U-2-Net_vaiq_int8 model support

Failer op in [U-2-Net_vaiq_int8.default.onnx.torch.elide.mlir](https://gist.github.com/AmosLewis/0b2daadbf68b26c6f7554318a0dab847#file-u-2-net_vaiq_int8-default-onnx-torch-elide-mlir) - https://git…

AmosLewis updated 3 months ago
2
Monaco12138/SR_Tensorflow #2

Faster QuickSR models?

Hi, I see multiple quicsr TFLite models available in the project. The default one is 540p which on GPU, currently takes ~200ms on my phone. Is there a faster one? Any of these maybe for 360p? ![ima…

arianaa30 updated 1 month ago
7
nvdla/vp #38

VP run time

I observed that nv_full INT8 inference on VP is taking more time than with FP16 inference. NVDLA HW branch: nvdlav1, config: nv_full NVDLA SW branch: Latest with INT8 option in nvdla_compiler Ple…

DrVijayK updated 4 years ago
3
PaddlePaddle/models #3346

使用量化压缩装载int8模型运行提示op mul Y must be the same. Get (float) != …

- 系统环境： - Paddle版本：1.5.1，CPU，无使用其他加速模块 - 系统: CentOS 6.3 - 问题描述： - 使用paddle.fluid.contrib.slim.Compressor模块进行模型压缩 - 压缩后的模型，float能正常运行，int8版出现以下错误： ![image](https://user-images.g…

Bond-H updated 4 years ago
9
PaddlePaddle/PaddleDetection #8819

Jetson Xavier NX上无法实现trt_int8推理

### 问题确认 Search before asking - [X] 我已经搜索过问题，但是没有找到解答。I have searched the question and found no related answer. ### 请提出你的问题 Please ask your question 在Jetson Xavier NX上使用Paddle Inference部署PaddleSli…

bbilixzc updated 6 months ago
1
vllm-project/vllm #3975

[RFC]: Int8 Activation Quantization

# Summary * We (engineering at @neuralmagic) are working on support for int8 quantized activations. * This RFC is proposing an _incremental_ approach to quantization, where the initial support for q…

tlrmchlsmth updated 1 week ago
2
google/automl #1052

Inference for int8 efficientdet-d{$n} is not running unlike …

I have followed the instruction provided by @fsx950223 to create a int8 quantized tflite model. The quantization was for weights and layers output. The tflite obtained from a efficientdet-d2 checkpoin…

drahmad89 updated 3 years ago
3

上一页 1...7 8 9 10 11 12 13...100 下一页

1000+ results for int8-inference

1000+ results
for int8-inference