ROCm / AMDMIGraphX

AMD's graph optimization engine.
https://rocm.docs.amd.com/projects/AMDMIGraphX/en/latest/
MIT License
178 stars 81 forks source link

Driver failed to verify pre-quantized int8 resnet50 model #1796

Open jerryyin opened 1 year ago

jerryyin commented 1 year ago

The following model from UIF 1.2 model zoo is a pre-quantized resnet50. It cannot be verified by MIGraphX driver out of the box.

http://mklnxpgk.amd.com/ModelZoo/UIF/1.2_dev-migraphx/pt_resnet50.tar.gz

$ ./bin/driver verify pt_resnet50/quantized/ResNet_int.onnx

Will yield the following output:

FAILED: pt_resnet50/quantized/ResNet_int.onnx
error: 0.316397
Max diff: 34674.9
Mismatch at 0: -2.5 != -5009.9

Peeking into the cpu execution result [-5, 5], it is a significant error margin compared with the gpu execution result.

kahmed10 commented 1 year ago

Looks like with recent changes to develop, I'm not able to see such a high max diff as here. This is what I see:

FAILED: /data/models/modelzoo/1.2-dev-migraphx/resnet50/quantized/ResNet_int.onnx
error: 0.0381007
Max diff: 0.797004
Mismatch at 0: -2.5 != -2.49888
kahmed10 commented 1 year ago

Looks like with recent changes to develop, I'm not able to see such a high max diff as here. This is what I see:

FAILED: /data/models/modelzoo/1.2-dev-migraphx/resnet50/quantized/ResNet_int.onnx
error: 0.0381007
Max diff: 0.797004
Mismatch at 0: -2.5 != -2.49888

ref is actually running in fp32, whereas the gpu is running in int8. This is because ref doesn't have the simplify_qdq pass enabled. When running that pass alongside changes from #1910, I was able to get driver verify to pass.

aserio commented 1 year ago

Final Step: @pfultz2 wanted to verify this task with a test suite with real input data before closing this issue.

krzysz00 commented 1 year ago

I have this failure on a Navi31

FAILED: /data/onnx_models/resnet50/quantized/ResNet_int.onnx
error: 0.0424039
Max diff: 0.886456
Mismatch at 0: -2.5 != -2.49333

But that might be a case of not going through MLIR because of

MIOpen(HIP): Error [EvaluateInvokers] /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/gemm_v2.cpp:671: rocBlas error encountered
aserio commented 1 year ago

@umangyadav, can you provide an update on this ticket?

kahmed10 commented 1 year ago

We have not tried on Navi31, but there is minor work left to do which is to verify using onnx test suite with real input data. The work is still pending.

causten commented 11 months ago

MIGRAPHX_ENABLE_MLIR=1 migraphx-driver verify /workspace/resnet50/quantized/ResNet_int.onnx --int8

@305 = @return(@304), target_id=0

FAILED: /workspace/resnet50/quantized/ResNet_int.onnx error: 0.0400062 Max diff: 0.731752 Mismatch at 0: -2.5 != -2.46051

Tested with 12648 driver and develop+#2134 using Navi31 hardware