Remove `qlinear_reused` matcher and instead fuse MLIR `quant_dot` with base pointwise operators

CharlieL7 commented 1 month ago

There's an accuracy error in resulting from the qlinear_reused matcher in simplify_qdq.
- Note that the other half of the quantized resnet50 accuracy issue was from a disconnect between rocMLIR and MIGX on handling the zero-point subtraction precision.
The intent of the qlinear_reused matcher was to merge more operations by making it such that an intermediate result is not used multiple times.
The accuracy problem came from the fact that the matcher immediately dequantizes a quantized result to get around the previous reuse.
If we're instead able to do input pointwise fusions to quant_conv we should be able to get around the issue entirely.

pfultz2 commented 1 month ago

The problem is that we would now output fp16 instead of int8. We should try to re-enable this matcher. Of course, there is accuracy loss from quantization, but we would have the same issue if we quantized the bias. Perhaps there is a better choice of scales in order to improve the accuracy for these cases.

CharlieL7 commented 1 month ago

Closing, we do pass verify accuracy with MLIR's update and testing with the program mentioned in #2949 and a couple of different random seeds I tried.

ROCm / AMDMIGraphX

Remove `qlinear_reused` matcher and instead fuse MLIR `quant_dot` with base pointwise operators #3269