Open ahsan-ca opened 1 week ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 92.18%. Comparing base (
b39a938
) to head (250258e
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Windows build failure should be fixed by: 4ff36ae574e6479c5510812ae8f001f9a1070e2c:
C:\home\jenkins\agent\workspace\UIF2_AMDMIGraphX_PR-3594\AMDMIGraphX\src\targets\gpu\compile_hipblaslt.cpp:78:26: error: use of undeclared identifier 'hipblaslt_workspace_size'
78 | std::size_t ws = hipblaslt_workspace_size;
| ^
1 error generated.
This needs a test. There are many compiler pass tests in directory test/
that make a run_pass()
call and could be used for examples. It's not clear to me whether it's relevant to add a test case to test/gpu/gemm_tune.cpp
.
Current code looks good. As Brian mentions would like to see some tests. The main reason for this compiler pass is to do output fusions with hipblaslt, right?
The main reason for this compiler pass is to do output fusions with hipblaslt, right?
The main reason is to be able to insert accurate workspace allocation after tuning. We cant insert workspace allocation instructions during lowering because we dont know the layouts until after eliminate_contiguous and we also need to run tuning as well.
Chris reached out to Bryant and discussed the issue where none of the algos returned by getAllAlgos
is supported/valid. Bryant's suggestion was to use hipblasLtMatmulAlgoGetHeuristic()
with requestedAlgoCount=1
, and then query the index for the algo returned by getIndexFromAlgo()
. I have made the changes for this in tune()
via commit: 2ae35e0.
Test | Batch | Rate new 250258 |
Rate old b39a93 |
Diff | Compare | |
---|---|---|---|---|---|---|
torchvision-resnet50 | 64 | 3,260.12 | 3,259.98 | 0.00% | :white_check_mark: | |
torchvision-resnet50_fp16 | 64 | 6,988.96 | 6,991.63 | -0.04% | :white_check_mark: | |
torchvision-densenet121 | 32 | 2,434.72 | 2,436.76 | -0.08% | :white_check_mark: | |
torchvision-densenet121_fp16 | 32 | 4,101.04 | 4,076.33 | 0.61% | :white_check_mark: | |
torchvision-inceptionv3 | 32 | 1,629.01 | 1,630.75 | -0.11% | :white_check_mark: | |
torchvision-inceptionv3_fp16 | 32 | 2,746.52 | 2,750.13 | -0.13% | :white_check_mark: | |
cadene-inceptionv4 | 16 | 765.70 | 766.23 | -0.07% | :white_check_mark: | |
cadene-resnext64x4 | 16 | 811.34 | 811.64 | -0.04% | :white_check_mark: | |
slim-mobilenet | 64 | 7,466.21 | 7,471.32 | -0.07% | :white_check_mark: | |
slim-nasnetalarge | 64 | 208.42 | 208.48 | -0.03% | :white_check_mark: | |
slim-resnet50v2 | 64 | 3,440.06 | 3,441.77 | -0.05% | :white_check_mark: | |
bert-mrpc-onnx | 8 | 1,151.41 | 1,151.97 | -0.05% | :white_check_mark: | |
bert-mrpc-tf | 1 | 460.06 | 492.75 | -6.63% | :red_circle: | |
pytorch-examples-wlang-gru | 1 | 414.47 | 417.46 | -0.71% | :white_check_mark: | |
pytorch-examples-wlang-lstm | 1 | 408.33 | 406.83 | 0.37% | :white_check_mark: | |
torchvision-resnet50_1 | 1 | 781.34 | 775.10 | 0.80% | :white_check_mark: | |
cadene-dpn92_1 | 1 | 397.20 | 421.44 | -5.75% | :red_circle: | |
cadene-resnext101_1 | 1 | 383.11 | 383.26 | -0.04% | :white_check_mark: | |
onnx-taau-downsample | 1 | 345.67 | 346.67 | -0.29% | :white_check_mark: | |
dlrm-criteoterabyte | 1 | 33.33 | 33.33 | 0.01% | :white_check_mark: | |
dlrm-criteoterabyte_fp16 | 1 | 52.50 | 52.76 | -0.50% | :white_check_mark: | |
agentmodel | 1 | 8,172.67 | 9,419.30 | -13.23% | :red_circle: | |
unet_fp16 | 2 | 58.91 | 58.87 | 0.06% | :white_check_mark: | |
resnet50v1_fp16 | 1 | 942.95 | 941.94 | 0.11% | :white_check_mark: | |
resnet50v1_int8 | 1 | 1,018.89 | 985.94 | 3.34% | :high_brightness: | |
bert_base_cased_fp16 | 64 | 1,169.84 | 1,171.42 | -0.13% | :white_check_mark: | |
bert_large_uncased_fp16 | 32 | 363.40 | 363.44 | -0.01% | :white_check_mark: | |
bert_large_fp16 | 1 | 198.69 | 200.59 | -0.95% | :white_check_mark: | |
distilgpt2_fp16 | 16 | 2,199.29 | 2,203.62 | -0.20% | :white_check_mark: | |
yolov5s | 1 | 531.68 | 526.10 | 1.06% | :white_check_mark: | |
tinyllama | 1 | 43.41 | 43.40 | 0.02% | :white_check_mark: | |
vicuna-fastchat | 1 | 173.53 | 172.63 | 0.52% | :white_check_mark: | |
whisper-tiny-encoder | 1 | 418.66 | 418.90 | -0.06% | :white_check_mark: | |
whisper-tiny-decoder | 1 | 435.15 | 428.52 | 1.55% | :white_check_mark: |
This build is not recommended to merge :red_circle:
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output
This pass helps to allocate actual workspace size needed for hipblaslt algos. It does so by querying for the workspace needed for the particular algo, and allocates memory accordingly.