-
Hi @all
Thank you for your good work!
I notice using the InternVL−Chat−V1.5-Int8 model, the inference time is very slow, as mentioned in [link](https://github.com/OpenGVLab/InternVL/issues/157)
…
-
Hello,
I am using the model `ssd_mobilenet_v2_fpnlite_035_416_int8.tflite` from [object_detection/pretrained_models/ssd_mobilenet_v2_fpnlite/ST_pretrainedmodel_public_dataset/coco_2017_person/ssd_…
-
**Describe the bug**
Trying to use DeepSpeed Inference with int8 does not work for GPTJ. I get created an issue that has more details on the DeepSpeed MII repo, but due to the nature of the issue, I…
-
Hello, recently I encountered an issue when deploying the model from [YOLO-World](https://github.com/AILab-CVC/YOLO-World) to a device using TFLM. I found that with the same INT8 per-channel quantized…
-
### Describe the issue
1. Tried running https://github.com/intel/intel-extension-for-pytorch/blob/release/2.3/examples/cpu/inference/python/llm/run.py to generate the q_config_summary file
2. Then…
-
- quantize (OK)
```bash
python3 -m modelopt.onnx.quantization --onnx_path encoder.onnx \
--quantize_mode int8 --output_path encoder-w8a8-int8.onnx
/root/anaconda3/envs/modelopt/lib/pyt…
-
Are there any runnable demos of using Sparse-QAT/PTQ (2:4) to accelerate inference, such as applying PTQ to a 2:4 sparse LLaMA for inference acceleration? I am curious about the potential speedup rati…
-
### Before Asking
- [X] I have read the [README](https://github.com/meituan/YOLOv6/blob/main/README.md) carefully. 我已经仔细阅读了README上的操作指引。
- [X] I want to train my custom dataset, and I have read the …
-
### 🚀 The feature, motivation and pitch
**Feature motivation:**
[Default pyTorch quantization aware training](https://pytorch.org/docs/stable/quantization.html) uses "fake-quantization" approach. Fo…
-
### What happened?
Inconsistency found when lowering Inception_v4_vaiq_int8 model https://github.com/nod-ai/SHARK-TestSuite/issues/190
1. **Passed**: standalone torch-mlir-opt + iree: onnx -> to…