Open pfultz2 opened 2 weeks ago
Need to investigate the nasnet failure.
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 92.19%. Comparing base (
f5df004
) to head (7f35c2a
). Report is 3 commits behind head on develop.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
LGTM. Is this the only case where doing a convolution as a GEMM is beneficial? Technically any convolution can be done as a GEMM.
For other sizes, we need to do windowing(and possibly some padding) which will require an extra copy when we reshape to a gemm, with tensors that are larger then the original input.
Test | Batch | Rate new 7bceaa |
Rate old c51bea |
Diff | Compare | |
---|---|---|---|---|---|---|
torchvision-resnet50 | 64 | 3,124.05 | 3,257.81 | -4.11% | :red_circle: | |
torchvision-resnet50_fp16 | 64 | 6,653.95 | 6,987.81 | -4.78% | :red_circle: | |
torchvision-densenet121 | 32 | 2,430.45 | 2,434.57 | -0.17% | :white_check_mark: | |
torchvision-densenet121_fp16 | 32 | 4,065.26 | 4,065.61 | -0.01% | :white_check_mark: | |
torchvision-inceptionv3 | 32 | 1,623.53 | 1,637.17 | -0.83% | :white_check_mark: | |
torchvision-inceptionv3_fp16 | 32 | 2,717.60 | 2,759.26 | -1.51% | :white_check_mark: | |
cadene-inceptionv4 | 16 | 748.92 | 776.31 | -3.53% | :red_circle: | |
cadene-resnext64x4 | 16 | 677.68 | 811.75 | -16.52% | :red_circle: | |
slim-mobilenet | 64 | 7,398.49 | 7,533.16 | -1.79% | :white_check_mark: | |
slim-nasnetalarge | 64 | 182.09 | 211.39 | -13.86% | :red_circle: | |
slim-resnet50v2 | 64 | 3,235.94 | 3,504.83 | -7.67% | :red_circle: | |
bert-mrpc-onnx | 8 | 1,149.65 | 1,146.47 | 0.28% | :white_check_mark: | |
bert-mrpc-tf | 1 | 475.33 | 473.89 | 0.30% | :white_check_mark: | |
pytorch-examples-wlang-gru | 1 | 413.32 | 425.31 | -2.82% | :white_check_mark: | |
pytorch-examples-wlang-lstm | 1 | 392.61 | 408.68 | -3.93% | :red_circle: | |
torchvision-resnet50_1 | 1 | 725.45 | 771.75 | -6.00% | :red_circle: | |
cadene-dpn92_1 | 1 | 417.52 | 399.01 | 4.64% | :high_brightness: | |
cadene-resnext101_1 | 1 | 325.49 | 383.85 | -15.20% | :red_circle: | |
onnx-taau-downsample | 1 | 345.58 | 343.09 | 0.72% | :white_check_mark: | |
dlrm-criteoterabyte | 1 | 33.31 | 33.31 | -0.01% | :white_check_mark: | |
dlrm-criteoterabyte_fp16 | 1 | 52.70 | 52.71 | -0.01% | :white_check_mark: | |
agentmodel | 1 | 8,560.32 | 8,235.67 | 3.94% | :high_brightness: | |
unet_fp16 | 2 | 58.76 | 58.90 | -0.23% | :white_check_mark: | |
resnet50v1_fp16 | 1 | 875.99 | 940.89 | -6.90% | :red_circle: | |
resnet50v1_int8 | 1 | 1,033.20 | 1,025.93 | 0.71% | :white_check_mark: | |
bert_base_cased_fp16 | 64 | 1,171.51 | 1,170.88 | 0.05% | :white_check_mark: | |
bert_large_uncased_fp16 | 32 | 355.32 | 363.69 | -2.30% | :white_check_mark: | |
bert_large_fp16 | 1 | 192.28 | 200.14 | -3.93% | :red_circle: | |
distilgpt2_fp16 | 16 | 2,203.42 | 2,200.77 | 0.12% | :white_check_mark: | |
yolov5s | 1 | 524.92 | 535.15 | -1.91% | :white_check_mark: | |
tinyllama | 1 | 43.43 | 43.41 | 0.03% | :white_check_mark: | |
vicuna-fastchat | 1 | 176.01 | 178.09 | -1.17% | :white_check_mark: | |
whisper-tiny-encoder | 1 | 417.66 | 418.18 | -0.13% | :white_check_mark: | |
whisper-tiny-decoder | 1 | 434.80 | 427.58 | 1.69% | :white_check_mark: |
This build is not recommended to merge :red_circle:
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output
I cant merge this with the amount of perf regressions
This allows us to use rocblas for some gemms.