-
### Idea
Use int4 as the compression technique to fit larger models onto Navi machines or possibly MI series machines. Weights would be compressed using encoding scheme that would pack two 4 bits n…
-
Hello, I'm trying to train YOLOv8-large in int4 format. I took the training recipe available at [sparsezoo](https://sparsezoo.neuralmagic.com/models/yolov8-l-coco-pruned85_quantized?hardware=deepspars…
-
Allocating 1D tensor of type int4 via [`BufferFromHostBuffer`](https://github.com/openxla/xla/blob/41ad400d325243342b11e1d55232b34bbd590b8c/xla/pjrt/cpu/cpu_client.cc#L944) with `byte_strides = std::n…
-
i have tried tensorrt int8 quantize on sd models
now i want to improve performance further
my code is : mtq.quantize(sdmodel, mtq.INT4_AWQ_CFG, calibration_loop)
when i need to export quantize model t…
-
### The model to consider.
https://huggingface.co/openbmb/MiniCPM-V-2_6-int4
### The closest model vllm already supports.
_No response_
### What's your difficulty of supporting the model you want?…
-
I have seen many change about int4 in openxla, but lack of python front end api to enable int4 inference. Does keras have any plan for int4?
-
Dears,
I failed to run Llama-2-7b-chat-hf on NPU, please give me a hand.
1. I converted the mode by below command, and got two models,
a) optimum-cli export openvino --task text-generation -m Meta-…
-
Qwen has released some quantized models
Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4
Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4
Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8
Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8
since t…
-
model_path = "/home/test/models/LLM/baichuan2-7b/pytorch"
# Load and optimize the INT4 model with IPEX
low_bit = "sym_int4"
model_int4 = BigdlForCausalLM.from_pretrained(model_path, load_in_low_b…
-
Hi TensorRT-LLM team, Your work is incredible.
By following the READme file for [multi-modeling](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/multimodal/README.md), we were sucess to run…