int8-inference Search Results

1000+ results
for int8-inference

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

PaddlePaddle/PaddleSlim #1803

paddle_inference_eval验证性能，int8和fp32精度差距很大

压缩好的 bert mrpc模型，使用paddle_inference_eval验证性能，发现int8和fp32精度差距很大 --precision=fp32 84 --precision=fp16 84 --precision=int8 61

oydf updated 9 months ago
2
jundaf2/INT8-Flash-Attention-FMHA-Quantization #5

Have you tested its speed?

I test it with batch_num=1, seq_len=128, head_num=5, head_dim=64. It shows "FMHA Inference took 75.82559204 ms, 17.97742325 GFlop/s, 0.01728598 GB/s INT8 average absolute deviation: 1.552685 %". B…

jeshjesh updated 5 months ago
1
NVIDIA/TensorRT #3776

Lower-than-Expected Performance Improvement with INT8 Quanti…

## Description I recently attempted to utilize INT8 quantization with Stable Diffusion XL to enhance inference performance based on the claims made in a recent [TensorRT blog post](https://developer.…

teith updated 6 months ago
15
nvdla/vp #38

VP run time

I observed that nv_full INT8 inference on VP is taking more time than with FP16 inference. NVDLA HW branch: nvdlav1, config: nv_full NVDLA SW branch: Latest with INT8 option in nvdla_compiler Ple…

DrVijayK updated 4 years ago
3
google/automl #1052

Inference for int8 efficientdet-d{$n} is not running unlike …

I have followed the instruction provided by @fsx950223 to create a int8 quantized tflite model. The quantization was for weights and layers output. The tflite obtained from a efficientdet-d2 checkpoin…

drahmad89 updated 3 years ago
3
GirinMan/HYU-Graduation-Project-Quantization #22

최종 결과용 실험 계획

### 실험 계획 - 어떤 tuning 방법을 사용했을때 memory efficient한가? - 어떤 quantization 방법을 사용했을 때 inference time에서 정확도가 높은가? #### Finetuning 과정에서 메모리 사용량 비교군 1. Full finetuning 2. LoRA tuning 3. llm.int8() + L…

GirinMan updated 1 year ago
9
hhk7734/tensorflow-yolov4 #92

Apparent differences between results from cpu and edgetpu

Hi, thanks for your wonderful work. However, I have got very different results from cpu and edgetpu. In the following image, left one is the result using cpu, and the right one is the result using edg…

tino926 updated 2 years ago
1
intel-analytics/ipex-llm #9855

[Question]: How to use int8 API (nano or ggml?) in LLAMA inf…

Hi, guys. I notice that BigDL utilizes BigDL nano and ggml to accelerate int8/int4 computations. I wonder how to invoke these APIs in LLMs like LLAMA. Specifically, I want to accelerate the linear lay…

llCurious updated 9 months ago
3
NVIDIA-AI-IOT/torch2trt #615

loading and doing inference float32 in c++ API are working…

Firstly, thanks for this project that is of high quality. I converte my model with torch2trt in code: ... model_trt_float32 = torch2trt( my_model,[ims],max_batch_size=32); model_trt…

joberzheng updated 3 years ago
2
NVIDIA/TensorRT #4068

Missing scale and zeropoint for lot of layers on calibrating…

## Description I generated calibration cache for Vision Transformer onnx model using EntropyCalibration2 method. When trying to generate engine file using cache file for INT8 precision using trte…

Shalini194 updated 2 months ago
14

上一页 1...8 9 10 11 12 13 14...100 下一页

1000+ results for int8-inference

1000+ results
for int8-inference