ROCm / AMDMIGraphX

AMD's graph optimization engine.
https://rocm.docs.amd.com/projects/AMDMIGraphX/en/latest/
MIT License
185 stars 86 forks source link

Rewrite 1x1 convolutions to gemm #3568

Open pfultz2 opened 2 weeks ago

pfultz2 commented 2 weeks ago

This allows us to use rocblas for some gemms.

pfultz2 commented 2 weeks ago

Need to investigate the nasnet failure.

codecov[bot] commented 2 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 92.19%. Comparing base (f5df004) to head (7f35c2a). Report is 3 commits behind head on develop.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #3568 +/- ## =========================================== + Coverage 92.17% 92.19% +0.01% =========================================== Files 513 515 +2 Lines 21536 21606 +70 =========================================== + Hits 19851 19919 +68 - Misses 1685 1687 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

pfultz2 commented 5 days ago

LGTM. Is this the only case where doing a convolution as a GEMM is beneficial? Technically any convolution can be done as a GEMM.

For other sizes, we need to do windowing(and possibly some padding) which will require an extra copy when we reshape to a gemm, with tensors that are larger then the original input.

migraphx-bot commented 3 days ago
Test Batch Rate new
7bceaa
Rate old
c51bea
Diff Compare
torchvision-resnet50 64 3,124.05 3,257.81 -4.11% :red_circle:
torchvision-resnet50_fp16 64 6,653.95 6,987.81 -4.78% :red_circle:
torchvision-densenet121 32 2,430.45 2,434.57 -0.17% :white_check_mark:
torchvision-densenet121_fp16 32 4,065.26 4,065.61 -0.01% :white_check_mark:
torchvision-inceptionv3 32 1,623.53 1,637.17 -0.83% :white_check_mark:
torchvision-inceptionv3_fp16 32 2,717.60 2,759.26 -1.51% :white_check_mark:
cadene-inceptionv4 16 748.92 776.31 -3.53% :red_circle:
cadene-resnext64x4 16 677.68 811.75 -16.52% :red_circle:
slim-mobilenet 64 7,398.49 7,533.16 -1.79% :white_check_mark:
slim-nasnetalarge 64 182.09 211.39 -13.86% :red_circle:
slim-resnet50v2 64 3,235.94 3,504.83 -7.67% :red_circle:
bert-mrpc-onnx 8 1,149.65 1,146.47 0.28% :white_check_mark:
bert-mrpc-tf 1 475.33 473.89 0.30% :white_check_mark:
pytorch-examples-wlang-gru 1 413.32 425.31 -2.82% :white_check_mark:
pytorch-examples-wlang-lstm 1 392.61 408.68 -3.93% :red_circle:
torchvision-resnet50_1 1 725.45 771.75 -6.00% :red_circle:
cadene-dpn92_1 1 417.52 399.01 4.64% :high_brightness:
cadene-resnext101_1 1 325.49 383.85 -15.20% :red_circle:
onnx-taau-downsample 1 345.58 343.09 0.72% :white_check_mark:
dlrm-criteoterabyte 1 33.31 33.31 -0.01% :white_check_mark:
dlrm-criteoterabyte_fp16 1 52.70 52.71 -0.01% :white_check_mark:
agentmodel 1 8,560.32 8,235.67 3.94% :high_brightness:
unet_fp16 2 58.76 58.90 -0.23% :white_check_mark:
resnet50v1_fp16 1 875.99 940.89 -6.90% :red_circle:
resnet50v1_int8 1 1,033.20 1,025.93 0.71% :white_check_mark:
bert_base_cased_fp16 64 1,171.51 1,170.88 0.05% :white_check_mark:
bert_large_uncased_fp16 32 355.32 363.69 -2.30% :white_check_mark:
bert_large_fp16 1 192.28 200.14 -3.93% :red_circle:
distilgpt2_fp16 16 2,203.42 2,200.77 0.12% :white_check_mark:
yolov5s 1 524.92 535.15 -1.91% :white_check_mark:
tinyllama 1 43.43 43.41 0.03% :white_check_mark:
vicuna-fastchat 1 176.01 178.09 -1.17% :white_check_mark:
whisper-tiny-encoder 1 417.66 418.18 -0.13% :white_check_mark:
whisper-tiny-decoder 1 434.80 427.58 1.69% :white_check_mark:

This build is not recommended to merge :red_circle:

migraphx-bot commented 3 days ago


     :white_check_mark: bert-mrpc-onnx: PASSED: MIGraphX meets tolerance
     :white_check_mark: bert-mrpc-tf: PASSED: MIGraphX meets tolerance
     :white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance
     :white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance
     :white_check_mark: torchvision-resnet50_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: cadene-dpn92_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: cadene-resnext101_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance
     :white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance
     :white_check_mark: unet: PASSED: MIGraphX meets tolerance
     :white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance
     :white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

     :white_check_mark: bert_large: PASSED: MIGraphX meets tolerance
     :white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance
     :white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance
     :white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance

causten commented 3 days ago

I cant merge this with the amount of perf regressions