Open ARES3366 opened 7 months ago
from optimum.onnxruntime import ORTQuantizer from optimum.onnxruntime.configuration import AutoQuantizationConfig dynamic_quantizer = ORTQuantizer.from_pretrained( output_model_path, 'model_optimized.onnx')
extra_options = {'DefaultTensorType': onnx.TensorProto.FLOAT}
dqconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=False)
dynamic_quantizer.quantize(save_dir=output_model_path,quantization_config=dqconfig)
tokenizer.save_pretrained(output_model_path) How should I change it
Hi @ARES3366 have you solved this? I think the problem lies in quantization config 🤔
RuntimeError: Unable to find data type for weight_name='/encoder/layer.0/attention/output/dense/MatMul_output_0'. shape_inference failed to return a type probably this node is from a different domain or using an input produced by such an operator. This may happen if you quantize a model already quantized. You may use extra_options
DefaultTensorType
to indicate the default weight type, usuallyonnx.TensorProto.FLOAT
.