Closed tcgu-amd closed 1 month ago
Unit tests need to be added for this.
Unit tests need to be added for this.
@pfultz2 Should they be included in the scope of this PR? Thanks!
Unit tests need to be added for this.
@pfultz2 Should they be included in the scope of this PR? Thanks!
Yes please. Feature and Validation belong in each PR
Should they be included in the scope of this PR?
Yes, unit tests need to be added to verify the PR works as expected(see Contributing section). A test case can be added under test/instruction.cpp file.
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 92.17%. Comparing base (
ddc4c0c
) to head (1362529
). Report is 1 commits behind head on develop.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Thanks for the test! Just need to fix the licensing and cppcheck issue.
@pfultz2 @causten Seems like cppcheck and licensing are both okay now. Is there anything else you would like to me do? Thanks!
Test | Batch | Rate new ddc4c0 |
Rate old ddc4c0 |
Diff | Compare | |
---|---|---|---|---|---|---|
torchvision-resnet50 | 64 | 3,259.05 | 3,260.53 | -0.05% | :white_check_mark: | |
torchvision-resnet50_fp16 | 64 | 6,994.24 | 6,988.30 | 0.09% | :white_check_mark: | |
torchvision-densenet121 | 32 | 2,438.28 | 2,435.76 | 0.10% | :white_check_mark: | |
torchvision-densenet121_fp16 | 32 | 4,060.31 | 4,073.89 | -0.33% | :white_check_mark: | |
torchvision-inceptionv3 | 32 | 1,638.28 | 1,638.33 | -0.00% | :white_check_mark: | |
torchvision-inceptionv3_fp16 | 32 | 2,764.20 | 2,763.22 | 0.04% | :white_check_mark: | |
cadene-inceptionv4 | 16 | 777.15 | 776.49 | 0.09% | :white_check_mark: | |
cadene-resnext64x4 | 16 | 811.43 | 811.34 | 0.01% | :white_check_mark: | |
slim-mobilenet | 64 | 7,541.20 | 7,533.02 | 0.11% | :white_check_mark: | |
slim-nasnetalarge | 64 | 211.54 | 211.47 | 0.04% | :white_check_mark: | |
slim-resnet50v2 | 64 | 3,505.22 | 3,506.35 | -0.03% | :white_check_mark: | |
bert-mrpc-onnx | 8 | 1,149.10 | 1,151.01 | -0.17% | :white_check_mark: | |
bert-mrpc-tf | 1 | 464.63 | 498.30 | -6.76% | :red_circle: | |
pytorch-examples-wlang-gru | 1 | 418.32 | 422.79 | -1.06% | :white_check_mark: | |
pytorch-examples-wlang-lstm | 1 | 395.67 | 390.98 | 1.20% | :white_check_mark: | |
torchvision-resnet50_1 | 1 | 777.88 | 767.34 | 1.37% | :white_check_mark: | |
cadene-dpn92_1 | 1 | 409.35 | 399.69 | 2.41% | :white_check_mark: | |
cadene-resnext101_1 | 1 | 383.86 | 383.25 | 0.16% | :white_check_mark: | |
onnx-taau-downsample | 1 | 342.52 | 342.66 | -0.04% | :white_check_mark: | |
dlrm-criteoterabyte | 1 | 33.34 | 33.32 | 0.06% | :white_check_mark: | |
dlrm-criteoterabyte_fp16 | 1 | 52.73 | 52.71 | 0.04% | :white_check_mark: | |
agentmodel | 1 | 8,074.15 | 8,354.43 | -3.35% | :red_circle: | |
unet_fp16 | 2 | 58.85 | 58.81 | 0.06% | :white_check_mark: | |
resnet50v1_fp16 | 1 | 958.26 | 935.55 | 2.43% | :white_check_mark: | |
resnet50v1_int8 | 1 | 1,006.92 | 1,001.70 | 0.52% | :white_check_mark: | |
bert_base_cased_fp16 | 64 | 1,171.58 | 1,170.66 | 0.08% | :white_check_mark: | |
bert_large_uncased_fp16 | 32 | 363.72 | 363.70 | 0.00% | :white_check_mark: | |
bert_large_fp16 | 1 | 199.39 | 198.87 | 0.26% | :white_check_mark: | |
distilgpt2_fp16 | 16 | 2,203.46 | 2,203.10 | 0.02% | :white_check_mark: | |
yolov5s | 1 | 534.11 | 531.68 | 0.46% | :white_check_mark: | |
tinyllama | 1 | 43.43 | 43.43 | 0.02% | :white_check_mark: | |
vicuna-fastchat | 1 | 175.92 | 179.99 | -2.26% | :white_check_mark: | |
whisper-tiny-encoder | 1 | 418.29 | 417.45 | 0.20% | :white_check_mark: | |
whisper-tiny-decoder | 1 | 427.97 | 423.52 | 1.05% | :white_check_mark: |
This build is not recommended to merge :red_circle:
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output
Related issue: https://github.com/ROCm/AMDMIGraphX/issues/3110
instruction::replace() performs replace operation on the current instruction then recursively calls itself on any down stream instructions that depends on the output. This is equivalent to performing a pre-order tree walk which is problematic. A simple example would be
op1(op2(op4(x)), op3(op4(x))).
In this example, op1 is a binary operation and op2, op3, op4 are unary operations. The parse tree is simply
op4 / \ op2 op3 \ / op1
An pre-order tree walk on op4.replace will be called in the order of op4 -> op2 -> op1 -> op3
However, at op1, we will encounter an error of shape mismatch because op2 has already been replaced but op3 still hasn't
The correct implementation would be to use BFS, which would yield op4, op2, op3, op1.