I want to use partial quantization in my model as I have some special form of preprocessing included into the model. I found the documentation in UG1414 but there is some information missing.
I have following model:
def forward(self, x):
x = self.preprocessing_layer(x)
x = QuantStub()
x = self.layer1(x)
x = self.layer2(x)
...
x = DeQuantStub()
return x
After compilation I can observe that the model is represented in an .xmodel starting from QuantStub(). The documentation stops at this step. How can I now include the preprocessing_layer(x) into deployment? Is there a way to do this on the hardware directly using graph runner? Do I have to register the preprocessing_layer() as a custom operator and included it in the quantization (but I would really like to use QAT)?
Hello together,
I want to use partial quantization in my model as I have some special form of preprocessing included into the model. I found the documentation in UG1414 but there is some information missing.
I have following model:
After compilation I can observe that the model is represented in an .xmodel starting from QuantStub(). The documentation stops at this step. How can I now include the preprocessing_layer(x) into deployment? Is there a way to do this on the hardware directly using graph runner? Do I have to register the preprocessing_layer() as a custom operator and included it in the quantization (but I would really like to use QAT)?
Thank you in advance!