ROCm / AMDMIGraphX

AMD's graph optimization engine.
https://rocm.docs.amd.com/projects/AMDMIGraphX/en/latest/
MIT License
181 stars 82 forks source link

Accuracy issues with ONNX Zoo QDQ models #2704

Open gyulaz-htec opened 7 months ago

gyulaz-htec commented 7 months ago

12 quantized models from ONNX Zoo fails with dirver verify command. Some models are showing extreme differences between ref and gpu results.

rocm version: 5.7.0.50700-63~20.04 migraphx version: 352dcea2c6a03c495a6ba8667e19811bc5d1399b

RMS Error: 0.299546 Max diff: 52 Mismatch at 0: 1 != 18

RMS Error: 0.982469 Max diff: 0.984108 Mismatch at 0: 0.0948163 != 1

* [mobilenetv2-12-qdq.onnx](https://github.com/onnx/models/blob/main/validated/vision/classification/mobilenet/model/mobilenetv2-12-qdq.onnx):
```bash
RMS Error: 0.310712
Max diff: 1.75075e+06
Mismatch at 0: -0.43521 != -160776

RMS Error: 0.360478 Max diff: 287.336 Mismatch at 0: 6.21395 != -281.122



The remaining 6 with not that big difference (<1): bvlcalexnet-12-qdq.onnx, efficientnet-lite4-11-qdq, googlenet-12-qdq.onnx, inception-v1-12-qdq.onnx, squeezenet1.0-13-qdq.onnx, zfnet512-12-qdq.onnx
attila-dusnoki-htec commented 6 months ago

I looked into ssd-12-qdq model. From the verify reduce, it start to fail at the beginning. The qdq part differs. It is not much, but starts to accumulate.

@9 = gpu::code_object[code_object=5568,symbol_name=add_quantizelinear_dequantizelinear_kernel,global=2880000,local=1024,](@6,@8,output) -> float_type, {1, 64, 600, 600}, {23040000, 360000, 600, 1}, target_id=0

FAILED:
RMS Error: 0.00216051
Max diff: 0.0154344
Mismatch at 2: 0.293252 != 0.308686
@13 = gpu::convolution[op={padding={1, 1, 1, 1},stride={1, 1},dilation={1, 1},group=1,padding_mode=0},solution_object={binary_object: 452},algo=0,solution_id=0](@9,@10,@11,output) -> float_type, {1, 64, 300, 300}, {5760000, 90000, 300, 1}, target_id=0

FAILED:
RMS Error: 0.0134018
Max diff: 0.192386
Mismatch at 0: -0.572287 != -0.568523

Here is the full log (note that it is trimmed, still the file is almost 20mb) ssd_trim_651_reduce.log

Running trace eval result in the following "pairs":


Run instruction: @7 = gpu::code_object[code_object=7360,symbol_name=mlir_convolution_add,global=720128,local=256,](@5,image,@2,@6) -> float_type, {1, 64, 600, 600}, {23040000, 360000, 600, 1}, target_id=0
Output: 0.314537, 0.282968, 0.301353, 0.250092, 0.258948, ..., 0, 0, 0, 0, 0
Min value: -0.691589, Max value: 1.26428, Mean: 0.179939, StdDev: 0.198492

Run instruction: @973 = ref::convolution[padding={3, 3, 3, 3},stride={2, 2},dilation={1, 1},group=1,padding_mode=0](@972,@462) -> float_type, {1, 64, 600, 600}, {23040000, 360000, 600, 1}, target_id=0
Run instruction: @976 = ref::add(@973,@975) -> float_type, {1, 64, 600, 600}, {23040000, 360000, 600, 1}, target_id=0
Output: 0.314116, 0.282709, 0.300906, 0.249759, 0.259343, ..., 0, 0, 0, 0, 0
Min value: -0.689215, Max value: 1.26388, Mean: 0.179938, StdDev: 0.19847

---

Run instruction: @987 = ref::pooling[mode=max,padding={1, 1, 1, 1},padding_mode=0,stride={2, 2},lengths={3, 3},dilations={1, 1},ceil_mode=0,lp_order=2,dyn_global=0](@986) -> float_type, {1, 64, 300, 300}, {5760000, 90000, 300, 1}, target_id=0
Output: 0.416726, 0.416726, 0.370423, 0.447595, 0.447595, ..., 0, 0, 0, 0, 0
Min value: 0, Max value: 1.26561, Mean: 0.284767, StdDev: 0.216263

Run instruction: @9 = gpu::pooling[mode=max,padding={1, 1, 1, 1},padding_mode=0,stride={2, 2},lengths={3, 3},dilations={1, 1},ceil_mode=0,lp_order=2,dyn_global=0](@7,@8) -> float_type, {1, 64, 300, 300}, {5760000, 90000, 300, 1}, target_id=0
Output: 0.417394, 0.417394, 0.369186, 0.441414, 0.441414, ..., 0, 0, 0, 0, 0
Min value: -0.217672, Max value: 1.26428, Mean: 0.277293, StdDev: 0.227439

---

Run instruction: @998 = ref::convolution[padding={1, 1, 1, 1},stride={1, 1},dilation={1, 1},group=1,padding_mode=0](@997,@472) -> float_type, {1, 64, 300, 300}, {5760000, 90000, 300, 1}, target_id=0
Output: -0.572287, -0.735194, -0.849823, -0.81111, -0.727945, ..., -0.0421866, -0.0245669, -0.000553765, -0.0831649, 0.00397701
Min value: -1.89955, Max value: 1.37605, Mean: -0.358288, StdDev: 0.388836

Run instruction: @13 = gpu::convolution[op={padding={1, 1, 1, 1},stride={1, 1},dilation={1, 1},group=1,padding_mode=0},solution_object={binary_object: 452},algo=0,solution_id=0](@9,@10,@11,@12) -> float_type, {1, 64, 300, 300}, {5760000, 90000, 300, 1}, target_id=0
Output: -0.568523, -0.734826, -0.850189, -0.818894, -0.726753, ..., -0.0447632, -0.0208397, 0.0055382, -0.086567, 0.00517985
Min value: -1.91674, Max value: 1.40878, Mean: -0.365814, StdDev: 0.392288

---

Run instruction: @1012 = ref::convolution[padding={1, 1, 1, 1},stride={1, 1},dilation={1, 1},group=1,padding_mode=0](@1011,@482) -> float_type, {1, 64, 300, 300}, {5760000, 90000, 300, 1}, target_id=0
Output: -0.145314, -0.205175, -0.253356, -0.173176, -0.201857, ..., 0.0681343, 0.0362716, 0.0488621, 0.0138178, 0.190221
Min value: -1.51545, Max value: 1.97553, Mean: -0.00329987, StdDev: 0.139806

Run instruction: @1011 = ref::dequantizelinear(@1006,@1008,@1010) -> float_type, {1, 64, 300, 300}, {5760000, 90000, 300, 1}, target_id=0
Output: 0, 0, 0, 0, 0, ..., 0.0651742, 0.078209, 0.104279, 0.0260697, 0.117314
Min value: 0, Max value: 1.53811, Mean: 0.0571383, StdDev: 0.0868144

Run instruction: @482 = ref::dequantizelinear(@418,@479,@481) -> float_type, {64, 64, 3, 3}, {576, 9, 3, 1}, target_id=0
Output: 0.0244093, -0.0104611, -0.00348705, 0.0174352, -0.0278964, ..., 0.0139482, 0.0104611, 0.0139482, 0.0209223, 0.0104611
Min value: -0.397524, Max value: 0.442855, Mean: -0.000430017, StdDev: 0.0437298

Run instruction: @17 = gpu::convolution[op={padding={1, 1, 1, 1},stride={1, 1},dilation={1, 1},group=1,padding_mode=0},solution_object={binary_object: 452},algo=0,solution_id=0](@13,@14,@15,@16) -> float_type, {1, 64, 300, 300}, {5760000, 90000, 300, 1}, target_id=0
Output: -0.40654, -0.593933, -0.633349, -0.514487, -0.596129, ..., -0.855649, -0.911693, -0.854034, -0.930775, -0.525221
Min value: -7.1479, Max value: 2.92221, Mean: -0.363388, StdDev: 1.01762

Run instruction: @14 = hip::hip_copy_literal[id=main:@literal:41] -> float_type, {64, 64, 3, 3}, {576, 9, 3, 1}, target_id=0
Output: 0.0244093, -0.0104611, -0.00348705, 0.0174352, -0.0278964, ..., 0.0139482, 0.0104611, 0.0139482, 0.0209223, 0.0104611
Min value: -0.397524, Max value: 0.442855, Mean: -0.000430017, StdDev: 0.0437298

Run instruction: @15 = load[offset=0,end=0](@1) -> int8_type, {0}, {1}, target_id=0
Run instruction: @16 = load[offset=23040000,end=46080000](@1) -> float_type, {1, 64, 300, 300}, {5760000, 90000, 300, 1}, target_id=0

---

Run instruction: @1038 = ref::convolution[padding={1, 1, 1, 1},stride={1, 1},dilation={1, 1},group=1,padding_mode=0](@1037,@492) -> float_type, {1, 64, 300, 300}, {5760000, 90000, 300, 1}, target_id=0
Output: 0.0619159, -0.242668, -0.108012, -0.0691066, -0.181812, ..., 0.0311093, 0.222155, 0.388753, 0.317224, 0.297242
Min value: -1.51618, Max value: 1.53798, Mean: -0.33978, StdDev: 0.334902

Run instruction: @17 = gpu::convolution[op={padding={1, 1, 1, 1},stride={1, 1},dilation={1, 1},group=1,padding_mode=0},solution_object={binary_object: 452},algo=0,solution_id=0](@13,@14,@15,@16) -> float_type, {1, 64, 300, 300}, {5760000, 90000, 300, 1}, target_id=0
Output: -0.40654, -0.593933, -0.633349, -0.514487, -0.596129, ..., -0.855649, -0.911693, -0.854034, -0.930775, -0.525221
Min value: -7.1479, Max value: 2.92221, Mean: -0.363388, StdDev: 1.01762
nives-vukovic commented 6 months ago

The issue is inside the simplify_qdq::apply pass where in match::find_matches(m, match_find_quantizable_ops{}); a suitable operator match is found, however its apply function is aborted as one of the dequantizelinear operations has type uint8_t which is currently unsupported:

        // Only INT8 or FP8 type currently supported
        std::set<migraphx::shape::type_t> supported_types = {migraphx::shape::fp8e4m3fnuz_type,
                                                             migraphx::shape::int8_type};
        if(not contains(supported_types, dq1->inputs().front()->get_shape().type()) or
           not contains(supported_types, dq2->inputs().front()->get_shape().type()))
            return;

Currently verified for ssd-12-qdq.onnx and mobilenetv2-12-qdq.onnx