int4 Search Results - Githubissues

ROCm/AMDMIGraphX #3307

[INT4] Compress model by quantizing weights to int4

### Idea Use int4 as the compression technique to fit larger models onto Navi machines or possibly MI series machines. Weights would be compressed using encoding scheme that would pack two 4 bits n…

umangyadav updated 1 week ago

neuralmagic/sparseml #2346

YOLOv8 - INT4 Training

Hello, I'm trying to train YOLOv8-large in int4 format. I took the training recipe available at [sparsezoo](https://sparsezoo.neuralmagic.com/models/yolov8-l-coco-pruned85_quantized?hardware=deepspars…

yoloyash updated 1 week ago

openxla/xla #16795

`TfrtCpuClient` fails to allocate int2/int4 buffer

Allocating 1D tensor of type int4 via [`BufferFromHostBuffer`](https://github.com/openxla/xla/blob/41ad400d325243342b11e1d55232b34bbd590b8c/xla/pjrt/cpu/cpu_client.cc#L944) with `byte_strides = std::n…

jonatanklosko updated 14 hours ago

NVIDIA/TensorRT #4101

can sd model use INT4_AWQ_CFG quantize?

i have tried tensorrt int8 quantize on sd models now i want to improve performance further my code is : mtq.quantize(sdmodel, mtq.INT4_AWQ_CFG, calibration_loop) when i need to export quantize model t…

worhar updated 6 days ago

vllm-project/vllm #7727

[New Model]: MiniCPM-V-2_6-int4

### The model to consider. https://huggingface.co/openbmb/MiniCPM-V-2_6-int4 ### The closest model vllm already supports. _No response_ ### What's your difficulty of supporting the model you want?…

tangent2018 updated 1 day ago

keras-team/keras #19738

Has any plan for int4?

I have seen many change about int4 in openxla, but lack of python front end api to enable int4 inference. Does keras have any plan for int4?

lingzhi98 updated 3 days ago

openvinotoolkit/openvino.genai #820

failed to run Llama-2-7b-chat-hf on NPU through Sample/Pytho…

Dears, I failed to run Llama-2-7b-chat-hf on NPU, please give me a hand. 1. I converted the mode by below command, and got two models, a) optimum-cli export openvino --task text-generation -m Meta-…

aoke79 updated 1 day ago

IuvenisSapiens/ComfyUI_Qwen2-VL-Instruct #2

Please added Qwen2-VL-GPTQ models support

Qwen has released some quantized models Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4 Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4 Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8 Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8 since t…

kiron111 updated 3 days ago

intel-analytics/ipex-llm #11959

how to use ipex-llm to run both the original and quantized v…

model_path = "/home/test/models/LLM/baichuan2-7b/pytorch" # Load and optimize the INT4 model with IPEX low_bit = "sym_int4" model_int4 = BigdlForCausalLM.from_pretrained(model_path, load_in_low_b…

tao-ov updated 1 week ago

NVIDIA/TensorRT-LLM #2158

KeyError: 'llava_llama'

Hi TensorRT-LLM team, Your work is incredible. By following the READme file for [multi-modeling](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/multimodal/README.md), we were sucess to run…

tiend1 updated 10 hours ago

1000+ results for int4

1000+ results
for int4