ROCm / AMDMIGraphX

AMD's graph optimization engine.
https://rocm.docs.amd.com/projects/AMDMIGraphX/en/latest/
MIT License
183 stars 84 forks source link

add migraphx load and save functions, allowing the hiding of an error message #3473

Closed apwojcik closed 4 days ago

apwojcik commented 5 days ago

The MIGraphX EP uses the migraphx_load and migraphx_save functions for MXR caching. If a model is not yet compiled (file missing from cache), a message in the console states an error occurred, which misled users. The v2 version of the functions allows error message reporting to be turned off.

The PR introduces the migraphx_load_v2 and migraphx_save_v2 functions. The original functions were replaced with macros, emulating the old behavior with the help of v2 functions.

TedThemistokleous commented 5 days ago

Did you test this with a rebuild and the latest stuff off repo.radeon.com?

Concerned since we're removing the old signature and Cpp name mangles based on arguments you'll have a runtime issue between libraries if a user updates MIGraphX package with this change but doesn't rebuild Onnxruntime with the new macro.

@apwojcik can you try the following with this change

  1. Install the latest Onnxruntime from repo.radeon on your system with MIGraphX latest 2 . Build this change as a package and install it over the existing MIGraphX in the system
  2. Try to run a model with the Onnxruntime MIGraphX EP

If you hit an error you may need to put the old signature back in for now.

codecov[bot] commented 5 days ago

Codecov Report

Attention: Patch coverage is 88.46154% with 3 lines in your changes missing coverage. Please review.

Project coverage is 92.04%. Comparing base (e230c02) to head (eaac578). Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
src/api/api.cpp 81.25% 3 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #3473 +/- ## ======================================== Coverage 92.04% 92.04% ======================================== Files 506 506 Lines 20872 20874 +2 ======================================== + Hits 19212 19214 +2 Misses 1660 1660 ``` | [Flag](https://app.codecov.io/gh/ROCm/AMDMIGraphX/pull/3473/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ROCm) | Coverage Δ | | |---|---|---| | [](https://app.codecov.io/gh/ROCm/AMDMIGraphX/pull/3473/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ROCm) | `92.04% <88.46%> (+<0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ROCm#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

migraphx-bot commented 4 days ago
Test Batch Rate new
eaac57
Rate old
ab5da3
Diff Compare
torchvision-resnet50 64 3,250.98 3,250.20 0.02% :white_check_mark:
torchvision-resnet50_fp16 64 6,984.86 6,981.72 0.04% :white_check_mark:
torchvision-densenet121 32 2,433.90 2,434.35 -0.02% :white_check_mark:
torchvision-densenet121_fp16 32 4,096.08 4,104.80 -0.21% :white_check_mark:
torchvision-inceptionv3 32 1,637.02 1,637.98 -0.06% :white_check_mark:
torchvision-inceptionv3_fp16 32 2,743.11 2,743.88 -0.03% :white_check_mark:
cadene-inceptionv4 16 778.86 778.58 0.04% :white_check_mark:
cadene-resnext64x4 16 807.68 808.70 -0.13% :white_check_mark:
slim-mobilenet 64 7,458.23 7,457.95 0.00% :white_check_mark:
slim-nasnetalarge 64 208.35 208.52 -0.08% :white_check_mark:
slim-resnet50v2 64 3,437.13 3,435.85 0.04% :white_check_mark:
bert-mrpc-onnx 8 1,152.49 1,153.18 -0.06% :white_check_mark:
bert-mrpc-tf 1 310.58 305.52 1.66% :white_check_mark:
pytorch-examples-wlang-gru 1 415.99 384.89 8.08% :high_brightness:
pytorch-examples-wlang-lstm 1 386.88 386.28 0.16% :white_check_mark:
torchvision-resnet50_1 1 755.10 768.57 -1.75% :white_check_mark:
cadene-dpn92_1 1 434.34 437.45 -0.71% :white_check_mark:
cadene-resnext101_1 1 374.14 383.50 -2.44% :white_check_mark:
onnx-taau-downsample 1 344.12 344.79 -0.20% :white_check_mark:
dlrm-criteoterabyte 1 35.05 35.03 0.06% :white_check_mark:
dlrm-criteoterabyte_fp16 1 58.06 58.06 -0.00% :white_check_mark:
agentmodel 1 8,260.98 7,924.68 4.24% :high_brightness:
unet_fp16 2 58.00 57.94 0.11% :white_check_mark:
resnet50v1_fp16 1 930.73 929.40 0.14% :white_check_mark:
resnet50v1_int8 1 941.96 960.30 -1.91% :white_check_mark:
bert_base_cased_fp16 64 1,153.83 1,153.43 0.03% :white_check_mark:
bert_large_uncased_fp16 32 355.95 355.83 0.03% :white_check_mark:
bert_large_fp16 1 211.79 210.44 0.64% :white_check_mark:
distilgpt2_fp16 16 2,158.66 2,159.20 -0.03% :white_check_mark:
yolov5s 1 533.17 539.05 -1.09% :white_check_mark:
tinyllama 1 43.38 43.40 -0.06% :white_check_mark:
vicuna-fastchat 1 175.08 170.95 2.41% :white_check_mark:
whisper-tiny-encoder 1 418.14 417.95 0.04% :white_check_mark:
whisper-tiny-decoder 1 424.17 423.97 0.05% :white_check_mark:

Check results before merge :high_brightness:

migraphx-bot commented 4 days ago


     :white_check_mark: bert-mrpc-onnx: PASSED: MIGraphX meets tolerance
     :white_check_mark: bert-mrpc-tf: PASSED: MIGraphX meets tolerance
     :white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance
     :white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance
     :white_check_mark: torchvision-resnet50_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: cadene-dpn92_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: cadene-resnext101_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance
     :white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance
     :white_check_mark: unet: PASSED: MIGraphX meets tolerance
     :white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance
     :white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

     :white_check_mark: bert_large: PASSED: MIGraphX meets tolerance
     :white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance
     :white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance
     :white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance

pfultz2 commented 4 days ago

This breaks binary compatibility. Why cant we just check in onnxruntime if the file exists first before we load it?

pfultz2 commented 4 days ago

Concerned since we're removing the old signature and Cpp name mangles based on arguments you'll have a runtime issue between libraries if a user updates MIGraphX package with this change but doesn't rebuild Onnxruntime with the new macro.

There is no name mangling since this a C API, but you are correct that binary compatibility is broken. A simpler solution would be to just check if the file exists before loading it.

apwojcik commented 4 days ago

This breaks binary compatibility. Why cant we just check in onnxruntime if the file exists first before we load it?

That would be too easy :-) But you have a valid point. I am closing this PR. Thanks!

apwojcik commented 4 days ago

Did you test this with a rebuild and the latest stuff off repo.radeon.com?

Concerned since we're removing the old signature and Cpp name mangles based on arguments you'll have a runtime issue between libraries if a user updates MIGraphX package with this change but doesn't rebuild Onnxruntime with the new macro.

@apwojcik can you try the following with this change

  1. Install the latest Onnxruntime from repo.radeon on your system with MIGraphX latest 2 . Build this change as a package and install it over the existing MIGraphX in the system
  2. Try to run a model with the Onnxruntime MIGraphX EP

If you hit an error you may need to put the old signature back in for now.

Ted, your concern is justified. I have not considered such a scenario. Anyway, I am closing the PR and using Paul's suggested more straightforward solution.

apwojcik commented 4 days ago

Closing the PR.