[Question]: Load quantized model from onnx format.

SylvainVerdy commented 1 year ago

Question

Hi,

I have severall questions concerning onnx models and quantization. I tried to export to onnx my models. I succed into save it as an onnx format.

model = SequenceTagger.load("./exps/camembert-large/models/NER/Flair/taggers/sota-ner-flair/best-model.pt").cpu()
model.embeddings = model.embeddings.export_onnx("flert-embeddings.onnx", sentences, providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
quantize = True
onnx_optimize = True
if quantize:
    model.embeddings.quantize_model(
        "flert-quantized-embeddings.onnx", extra_options={"DisableShapeInference": True}
    )

if onnx_optimize:
    model.embeddings.optimize_model(
        "flert-optimized-embeddings.onnx", opt_level=2, use_gpu=False, only_onnxruntime=True
    )

First question, Is it normal to find Ignore MatMul due to non constant B : /[/model/encoder/layer1../attention/self/MatMul..], when i tryed to quantize my model? Now, I'm trying to load my model, in SequenceTagger.load(). Do I need to save at the end of the code above my model in .pt to load into SequenceTagger to use TransformerOnnxWordEmbeddings class?

Do you have any example of loading onnx files in Inference to evaluate a corpus or several sentences?

Thanks a lot for your work!

helpmefindaname commented 1 year ago

Hi @SylvainVerdy

Is it normal to find Ignore MatMul due to non constant B : /[/model/encoder/layer1../attention/self/MatMul..], when i tryed to quantize my model?

Yes, that warning is normal. I cannot tell you why exactly that happens or what it means, but it happens on all hugging face models that I tested so far.

Do I need to save at the end of the code above my model in .pt to load into SequenceTagger to use TransformerOnnxWordEmbeddings class?

Yes, as stated in the tutorial, you need to save the model again and keep using the new model.

Do you have any example of loading onnx files in Inference to evaluate a corpus or several sentences?

There are no examples specific to models that contain onnx-embeddings, as you still need to use the model the same way as before.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

flairNLP / flair

[Question]: Load quantized model from onnx format. #3226

Question