UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.36k stars 2.49k forks source link

RuntimeError: Unable to find data type for weight_name='/encoder/layer.0/attention/output/dense/MatMul_output_0'. shape_inference failed to return a type probably this node is from a different domain or using an input produced by such an operator. This may happen if you quantize a model already quantized. You may use extra_options `DefaultTensorType` to indicate the default weight type, usually `onnx.TensorProto.FLOAT`. #2598

Open ARES3366 opened 7 months ago

ARES3366 commented 7 months ago

RuntimeError: Unable to find data type for weight_name='/encoder/layer.0/attention/output/dense/MatMul_output_0'. shape_inference failed to return a type probably this node is from a different domain or using an input produced by such an operator. This may happen if you quantize a model already quantized. You may use extra_options DefaultTensorType to indicate the default weight type, usually onnx.TensorProto.FLOAT.

ARES3366 commented 7 months ago

from optimum.onnxruntime import ORTQuantizer from optimum.onnxruntime.configuration import AutoQuantizationConfig dynamic_quantizer = ORTQuantizer.from_pretrained( output_model_path, 'model_optimized.onnx')

extra_options = {'DefaultTensorType': onnx.TensorProto.FLOAT}
dqconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=False)

dynamic_quantizer.quantize(save_dir=output_model_path,quantization_config=dqconfig)
tokenizer.save_pretrained(output_model_path)                                How should I change it
cindyangelira commented 1 month ago

Hi @ARES3366 have you solved this? I think the problem lies in quantization config 🤔