Rewrite 1x1 convolutions to gemm

pfultz2 commented 2 weeks ago

This allows us to use rocblas for some gemms.

pfultz2 commented 2 weeks ago

Need to investigate the nasnet failure.

codecov[bot] commented 2 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 92.19%. Comparing base (f5df004) to head (7f35c2a). Report is 3 commits behind head on develop.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## develop #3568 +/- ## =========================================== + Coverage 92.17% 92.19% +0.01% =========================================== Files 513 515 +2 Lines 21536 21606 +70 =========================================== + Hits 19851 19919 +68 - Misses 1685 1687 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

pfultz2 commented 5 days ago

LGTM. Is this the only case where doing a convolution as a GEMM is beneficial? Technically any convolution can be done as a GEMM.

For other sizes, we need to do windowing(and possibly some padding) which will require an extra copy when we reshape to a gemm, with tensors that are larger then the original input.

migraphx-bot commented 3 days ago

Test	Batch	Rate new 7bceaa	Rate old c51bea	Diff	Compare
torchvision-resnet50	64	3,124.05	3,257.81	-4.11%	:red_circle:
torchvision-resnet50_fp16	64	6,653.95	6,987.81	-4.78%	:red_circle:
torchvision-densenet121	32	2,430.45	2,434.57	-0.17%	:white_check_mark:
torchvision-densenet121_fp16	32	4,065.26	4,065.61	-0.01%	:white_check_mark:
torchvision-inceptionv3	32	1,623.53	1,637.17	-0.83%	:white_check_mark:
torchvision-inceptionv3_fp16	32	2,717.60	2,759.26	-1.51%	:white_check_mark:
cadene-inceptionv4	16	748.92	776.31	-3.53%	:red_circle:
cadene-resnext64x4	16	677.68	811.75	-16.52%	:red_circle:
slim-mobilenet	64	7,398.49	7,533.16	-1.79%	:white_check_mark:
slim-nasnetalarge	64	182.09	211.39	-13.86%	:red_circle:
slim-resnet50v2	64	3,235.94	3,504.83	-7.67%	:red_circle:
bert-mrpc-onnx	8	1,149.65	1,146.47	0.28%	:white_check_mark:
bert-mrpc-tf	1	475.33	473.89	0.30%	:white_check_mark:
pytorch-examples-wlang-gru	1	413.32	425.31	-2.82%	:white_check_mark:
pytorch-examples-wlang-lstm	1	392.61	408.68	-3.93%	:red_circle:
torchvision-resnet50_1	1	725.45	771.75	-6.00%	:red_circle:
cadene-dpn92_1	1	417.52	399.01	4.64%	:high_brightness:
cadene-resnext101_1	1	325.49	383.85	-15.20%	:red_circle:
onnx-taau-downsample	1	345.58	343.09	0.72%	:white_check_mark:
dlrm-criteoterabyte	1	33.31	33.31	-0.01%	:white_check_mark:
dlrm-criteoterabyte_fp16	1	52.70	52.71	-0.01%	:white_check_mark:
agentmodel	1	8,560.32	8,235.67	3.94%	:high_brightness:
unet_fp16	2	58.76	58.90	-0.23%	:white_check_mark:
resnet50v1_fp16	1	875.99	940.89	-6.90%	:red_circle:
resnet50v1_int8	1	1,033.20	1,025.93	0.71%	:white_check_mark:
bert_base_cased_fp16	64	1,171.51	1,170.88	0.05%	:white_check_mark:
bert_large_uncased_fp16	32	355.32	363.69	-2.30%	:white_check_mark:
bert_large_fp16	1	192.28	200.14	-3.93%	:red_circle:
distilgpt2_fp16	16	2,203.42	2,200.77	0.12%	:white_check_mark:
yolov5s	1	524.92	535.15	-1.91%	:white_check_mark:
tinyllama	1	43.43	43.41	0.03%	:white_check_mark:
vicuna-fastchat	1	176.01	178.09	-1.17%	:white_check_mark:
whisper-tiny-encoder	1	417.66	418.18	-0.13%	:white_check_mark:
whisper-tiny-decoder	1	434.80	427.58	1.69%	:white_check_mark:

This build is not recommended to merge :red_circle:

migraphx-bot commented 3 days ago

:white_check_mark: bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

:white_check_mark: bert-mrpc-tf: PASSED: MIGraphX meets tolerance

:white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

:white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

:white_check_mark: torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

:white_check_mark: cadene-dpn92_1: PASSED: MIGraphX meets tolerance

:white_check_mark: cadene-resnext101_1: PASSED: MIGraphX meets tolerance

:white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

:white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance

:white_check_mark: unet: PASSED: MIGraphX meets tolerance

:white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance

:white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark: bert_large: PASSED: MIGraphX meets tolerance

:white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance

:white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance

:white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

:white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance

causten commented 3 days ago

I cant merge this with the amount of perf regressions

ROCm / AMDMIGraphX

Rewrite 1x1 convolutions to gemm #3568

Codecov Report