ROCm / MIOpen

AMD's Machine Intelligence Library
https://rocm.docs.amd.com/projects/MIOpen/en/latest/
Other
1.08k stars 228 forks source link

Skip naive conv testing to speed up #3383

Open RobQuistNL opened 4 hours ago

RobQuistNL commented 4 hours ago

Hi,

Looking at running various models with various inputs - it seems a lot of time for the initial runs is being spent benchmarking potential kernels - including the naive ones (e.g. naive_conv_nonpacked_fwd_nchw_float_double_float)

The solution that comes up usually is not the naive one, but one of the other kernels. Running with MIOPEN_DEBUG_CONV_DIRECT=0 significantly speeds up initial runs of said model with varying resolutions.

Would it be an option to get this testing / benching dynamically, without excluding it completely? Where the naive kernel would be the least preferred - and if another is found it would be a safe bet to say the other implementation is faster (so the testing of the kernel itself could be skipped alltogether)

If its not desired behaviour - maybe this could be added behind a feature flag.

I'm quite sure that people running this without knowing about it, would experience major speedups in initial runs (the test case here is various VAE models being ran).

RobQuistNL commented 4 hours ago

Here's a snippet from the ufdb in question - I'm not 100% sure but I think this shows that some of those ConvDirectNaive kernels take a lot of time;

Click to view `HIP.3_2_0.ufdb.txt` **HIP.3_2_0.ufdb.txt** ```csv 1920-26-32-1x1-1280-26-32-2-0x0-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.25571,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.736573,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.289919,15564800,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.116864,19169280,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.266399,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:16.9739,0,miopenConvolutionFwdAlgoDirect 1280-52-64-3x3-1280-52-64-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:3.34123,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:6.71735,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:2.64133,63569920,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:2.03775,97648640,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:2.92911,76677120,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:240.954,0,miopenConvolutionFwdAlgoDirect 1920-52-64-3x3-640-52-64-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:2.55705,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:5.0626,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:2.24214,56197120,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:2.34646,73236480,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:2.36562,115015680,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:200.26,0,miopenConvolutionFwdAlgoDirect 1920-52-64-1x1-640-52-64-2-0x0-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:2.49522,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:1.22867,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.406174,36536320,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.242687,51118080,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.269183,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:35.5401,0,miopenConvolutionFwdAlgoDirect 1280-52-64-3x3-640-52-64-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.7151,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:3.38824,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:1.42153,40304640,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:1.02272,57344000,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:2.19766,76677120,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:129.973,0,miopenConvolutionFwdAlgoDirect 1280-52-64-1x1-640-52-64-2-0x0-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.67263,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.831037,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.327743,27197440,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.176575,42598400,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.192191,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:23.6178,0,miopenConvolutionFwdAlgoDirect 960-52-64-3x3-640-52-64-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.29372,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:2.53945,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.967996,32358400,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.697597,49397760,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:1.53158,57507840,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:96.8891,0,miopenConvolutionFwdAlgoDirect 960-52-64-1x1-640-52-64-2-0x0-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.26147,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.632862,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.283551,22528000,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.135807,38338560,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.219487,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:17.7857,0,miopenConvolutionFwdAlgoDirect 640-104-128-3x3-640-104-128-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:3.39595,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:6.30868,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:2.36338,75530240,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:1.94018,143687680,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:2.62095,153354240,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:279.665,0,miopenConvolutionFwdAlgoDirect 960-104-128-3x3-320-104-128-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:2.58604,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:4.89873,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:1.72639,73687040,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:2.27314,107765760,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:4.93534,230031360,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:256.787,0,miopenConvolutionFwdAlgoDirect 960-104-128-1x1-320-104-128-2-0x0-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:2.52303,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:1.25126,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.471806,68771840,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.334623,102236160,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.346175,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:43.6982,0,miopenConvolutionFwdAlgoDirect 640-104-128-3x3-320-104-128-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.74319,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:3.34469,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:1.4166,54804480,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:1.47539,88883200,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:1.90972,153354240,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:165.897,0,miopenConvolutionFwdAlgoDirect 640-104-128-1x1-320-104-128-2-0x0-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.70063,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.853245,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.39635,51527680,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.251679,85196800,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.289215,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:28.6757,0,miopenConvolutionFwdAlgoDirect 320-104-128-3x3-4-104-128-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:0.0983036,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.173375,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.315071,17275392,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.211327,17701376,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:0.734749,76677120,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:40.4575,0,miopenConvolutionFwdAlgoDirect 4-104-128-1x1-4-104-128-1-0x0-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:0.0151358,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.0191038,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.142335,426240,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.0238398,425984,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.0715196,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:0.149664,0,miopenConvolutionFwdAlgoDirect 4-104-128-3x3-512-104-128-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:0.0611198,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.0808954,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.287711,27549696,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.102943,27549696,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:0.111072,1916928,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:0.749117,0,miopenConvolutionFwdAlgoDirect 512-104-128-3x3-512-104-128-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:1.84121,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:3.48776,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:2.2487,63963136,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:1.93567,63963136,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:2.23257,245366784,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:101.823,0,miopenConvolutionFwdAlgoDirect 512-104-128-1x1-512-104-128-1-0x0-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:1.79679,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.900925,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.531326,55574528,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.365215,54525952,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.273055,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:18.3887,0,miopenConvolutionFwdAlgoDirect 512-208-256-3x3-512-208-256-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:7.17329,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:12.9223,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:8.36243,227540992,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:7.62704,227540992,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:8.17904,981467136,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:477.672,0,miopenConvolutionFwdAlgoDirect 512-416-512-3x3-512-416-512-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:28.9988,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:49.5818,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:33.8475,881852416,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:32.3395,881852416,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:41.5719,3925868544,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:2055.18,0,miopenConvolutionFwdAlgoDirect 512-416-512-3x3-256-416-512-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:14.8489,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:25.3045,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:17.5976,659030016,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:16.8068,659030016,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:21.3744,3925868544,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:1877.97,0,miopenConvolutionFwdAlgoDirect 256-416-512-3x3-256-416-512-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:7.64682,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:12.8598,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:10.3321,438566912,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:8.93561,438566912,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:10.1123,1962934272,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:873.538,0,miopenConvolutionFwdAlgoDirect 512-416-512-1x1-256-416-512-1-0x0-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:14.3863,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:6.59623,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:5.1019,654835712,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:4.32277,654311424,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:1.92758,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:286.65,0,miopenConvolutionFwdAlgoDirect 256-832-1024-3x3-256-832-1024-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:32.2882,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:50.3876,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:44.9249,1747189760,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:39.9932,1747189760,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:52.2714,7851737088,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:3816.86,0,miopenConvolutionFwdAlgoDirect 256-832-1024-3x3-128-832-1024-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:16.3588,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:25.5977,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:22.7511,1309802496,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:20.9185,1309802496,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:27.0331,7851737088,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:3346.55,0,miopenConvolutionFwdAlgoDirect ```