Closed CharlieL7 closed 1 month ago
The problem is that we would now output fp16 instead of int8. We should try to re-enable this matcher. Of course, there is accuracy loss from quantization, but we would have the same issue if we quantized the bias. Perhaps there is a better choice of scales in order to improve the accuracy for these cases.
Closing, we do pass verify accuracy with MLIR's update and testing with the program mentioned in #2949 and a couple of different random seeds I tried.
qlinear_reused
matcher insimplify_qdq
.qlinear_reused
matcher was to merge more operations by making it such that an intermediate result is not used multiple times.quant_conv
we should be able to get around the issue entirely.