[Issue]: Attention failing with gpt2-10.onnx

causten commented 5 months ago

Problem Description

This failure happens only when including the attention op.

MIGRAPHX_MLIR_USE_SPECIFIC_OPS="attention"  /opt/rocm/bin/migraphx-driver perf --exhaustive-tune /models/onnx-model-zoo/gpt2-10.onnx --batch 1 --fp16
Compiling ...
Reading: /models/onnx-model-zoo/gpt2-10.onnx
terminate called after throwing an instance of 'migraphx::version_2_10_0::exception'
  what():  /workspace/AMDMIGraphX/src/targets/gpu/compile_ops.cpp:222: benchmark: No valid tuned compilation for gpu::mlir_op with gfx942:sramecc+:xnack-       304     -t f16 -out_datatype f16 -transA false -transB false -g 12 -m 1 -n 1 -k 64

Operating System

22.04

CPU

AMD EPYC 7702 64-Core Processor

GPU

AMD Instinct MI300X

ROCm Version

ROCm 6.1.0

ROCm Component

ROCm

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

ravil-mobile commented 5 months ago

Hi @causten,

The following config looks like a gemm.

with gfx942:sramecc+:xnack-       304     -t f16 -out_datatype f16 -transA false -transB false -g 12 -m 1 -n 1 -k 64

Attention takes -transQ, -transK, -transV, etc.

I tried to run this config as a gemm op.

$ cd <rocmlir/build>
$ cmake --build . --target check-rocmlir-build-only ci-performance-scripts -j4
$ cat ./tune.conf
-t f16 -out_datatype f16 -transA false -transB false -g 12 -m 1 -n 1 -k 64

$ ./bin/tuningRunner.py --op=gemm --configs_file=./tune.conf --output=bt.tsv --verify-mode=none
...
Tested 62700 configs, best perf 0.00028235294117647056 TFlops on perf_config v2:32,16,8,32,16,4,3,1,1
Tested 62800 configs, best perf 0.00028235294117647056 TFlops on perf_config v2:32,16,8,32,16,4,3,1,1
Tested 62900 configs, best perf 0.00028235294117647056 TFlops on perf_config v2:32,16,8,32,16,4,3,1,1
Tested 63000 configs, best perf 0.00028235294117647056 TFlops on perf_config v2:32,16,8,32,16,4,3,1,1
Tested 63100 configs, best perf 0.00028235294117647056 TFlops on perf_config v2:32,16,8,32,16,4,3,1,1
Tested 63200 configs, best perf 0.00028235294117647056 TFlops on perf_config v2:32,16,8,32,16,4,3,1,1
Tested 63300 configs, best perf 0.00028235294117647056 TFlops on perf_config v2:32,16,8,32,16,4,3,1,1
Tested 63400 configs, best perf 0.00028235294117647056 TFlops on perf_config v2:32,16,8,32,16,4,3,1,1
Tested 63500 configs, best perf 0.00028235294117647056 TFlops on perf_config v2:32,16,8,32,16,4,3,1,1
Tuned : -t f16 -out_datatype f16 -transA false -transB false -g 12 -m 1 -n 1 -k 64 : v2:32,16,8,32,16,4,3,1,1 with 0.00028235294117647056 TFlops
Arch = gfx908:sramecc+:xnack-(120 CUs), vector = '-t f16 -out_datatype f16 -transA false -transB false -g 12 -m 1 -n 1 -k 64', perfConfig = v2:32,16,8,32,16,4,3,1,1

ravil-mobile commented 5 months ago

@causten, I managed to reproduce your bug even without exhaustive tuning on MI100.

MIGRAPHX_MLIR_USE_SPECIFIC_OPS="attention" MIGRAPHX_ENABLE_MLIR=1 ./bin/migraphx-driver perf --onnx $(realpath ../../models/gpt2-10.onnx) --batch 1 --fp16

Error: No valid tuned compilation for gpu::mlir_op with gfx908:sramecc+:xnack- 120 -t f16 -out_datatype f16 -transA false -transB false -g 12 -m 1 -n 1 -k 64

ravil-mobile commented 5 months ago

@manupak, this issue happens only with attention. All other combinations of MIGRAPHX_MLIR_USE_SPECIFIC_OPS works without any issue.

I am curious. I tried MIGRAPHX_MLIR_USE_SPECIFIC_OPS="" and I can see the following

Summary:
gpu::gemm: 0.691237ms / 12 = 0.0576031ms, 14%
gpu::code_object::layernorm_mul_add_kernel: 0.522419ms / 24 = 0.0217675ms, 11%
gpu::code_object::add_add_kernel: 0.445641ms / 23 = 0.0193757ms, 9%
gpu::code_object::mlir_reshape_dot: 0.379011ms / 12 = 0.0315842ms, 8%
gpu::code_object::mlir_transpose_reshape_dot: 0.376484ms / 12 = 0.0313737ms, 8%
gpu::code_object::mlir_reshape_dot_add: 0.364617ms / 12 = 0.0303847ms, 8%
gpu::code_object::convert_kernel: 0.267564ms / 13 = 0.0205819ms, 6%
gpu::code_object::mlir_reshape_slice_reshape_transpose_reshape_slice_reshape_transpose_dot_mul_sub_exp_div: 0.265477ms / 12 = 0.0221231ms, 6%
gpu::code_object::add_mul_mul_add_mul_exp_add_div_kernel: 0.248663ms / 12 = 0.0207219ms, 5%
gpu::code_object::contiguous_kernel: 0.24577ms / 13 = 0.0189054ms, 5%
gpu::code_object::concat_kernel: 0.23962ms / 12 = 0.0199683ms, 5%
gpu::code_object::mlir_reshape_slice_reshape_transpose_dot: 0.234107ms / 12 = 0.0195089ms, 5%
multibroadcast: 0.167775ms / 97 = 0.00172964ms, 4%
reshape_lazy: 0.149967ms / 121 = 0.0012394ms, 3%
load: 0.12052ms / 159 = 0.000757985ms, 3%
hip::hip_copy_literal: 0.119968ms / 148 = 0.000810595ms, 3%
slice: 0.0466943ms / 24 = 0.0019456ms, 1%
transpose: 0.0286174ms / 24 = 0.00119239ms, 1%
gpu::code_object::add_layernorm_mul_add_convert_kernel: 0.0250974ms / 1 = 0.0250974ms, 1%
unsqueeze: 0.0226409ms / 24 = 0.000943372ms, 1%
gpu::code_object::add_kernel: 0.0201048ms / 1 = 0.0201048ms, 1%
gpu::code_object::gather_kernel: 0.0174227ms / 1 = 0.0174227ms, 1%
@param: 0.00812028ms / 14 = 0.00058002ms, 1%
check_context::migraphx::gpu::context: 0.00113028ms / 1 = 0.00113028ms, 1%
hip::hip_allocate_memory: 0.00108758ms / 1 = 0.00108758ms, 1%

I am confused because I can see some MLIR ops even thought the list (i.e., MIGRAPHX_MLIR_USE_SPECIFIC_OPS) is empty

ravil-mobile commented 5 months ago

Hi @manupak,

I managed to the get a tracing log

 MIGRAPHX_MLIR_USE_SPECIFIC_OPS="attention" MIGRAPHX_ENABLE_MLIR=1 MIGRAPHX_TRACE_BENCHMARKING=<X> ./bin/migraphx-driver compile --onnx $(realpath ../../models/gpt2-10.onnx) --batch 1 --fp16 > mgx.log

MIGRAPHX_TRACE_BENCHMARKING=2

Running [ MIGraphX Version: 2.10.0.cc7c28016 ]: ./bin/migraphx-driver compile --onnx /home/ravil/work/migraphx/models/gpt2-10.onnx --batch 1 --fp16
Compiling ...
Reading: /home/ravil/work/migraphx/models/gpt2-10.onnx
Benchmarking gpu::mlir_op: 20 configs
Problem: gfx908:sramecc+:xnack- 120     -t f16 -out_datatype f16 -transA false -transB false -g 1 -m 1 -n 2304 -k 768
Benchmarking solution: v2:16,64,8,16,16,8,1,1,1
0.0710401ms
Benchmarking solution: v2:16,64,4,16,16,8,1,1,1
0.0664002ms
Benchmarking solution: v2:16,32,4,16,16,8,1,1,1
0.0674082ms
Benchmarking solution: v2:16,32,4,16,16,4,1,1,1
0.0954403ms
Benchmarking solution: v2:16,16,4,16,16,8,1,1,1
0.0600801ms
Benchmarking solution: v2:32,128,4,32,32,8,1,1,1
0.0623601ms
Benchmarking solution: v2:32,64,4,32,32,8,1,1,1
0.0594161ms
Benchmarking solution: v2:32,32,8,16,16,8,1,1,1
0.0602481ms
Benchmarking solution: v2:32,32,4,16,16,4,1,1,1
0.0614961ms
Benchmarking solution: v2:64,256,2,64,64,8,1,1,1
0.0629442ms
Benchmarking solution: v2:64,128,2,64,64,8,1,1,1
0.0586641ms
Benchmarking solution: v2:64,64,8,32,32,8,1,1,1
0.0664882ms
Benchmarking solution: v2:64,64,8,32,32,4,1,1,1
0.0619121ms
Benchmarking solution: v2:64,64,4,32,32,8,1,1,1
0.0625121ms
Benchmarking solution: v2:128,128,4,64,64,8,1,1,1
0.0600801ms
Benchmarking solution: v2:128,128,8,64,64,1,1,1,1
0.0578401ms
Benchmarking solution: v2:128,128,8,64,64,4,1,1,1
0.0633122ms
Benchmarking solution: v2:128,128,4,64,64,8,1,1,1
0.0632402ms
Benchmarking solution: v2:128,128,4,64,64,4,1,1,1
0.0630801ms
Benchmarking solution: v2:128,128,2,64,64,8,1,1,1
0.0636642ms
Fastest solution: v2:128,128,8,64,64,1,1,1,1
Benchmarking gpu::mlir_op: 20 configs
Problem: gfx908:sramecc+:xnack- 120     -t f16 -out_datatype f16 -transA false -transB false -g 12 -m 1 -n 1 -k 64
Benchmarking solution: v2:16,16,4,16,16,8,1,1,1
No binary
Benchmarking solution: v2:16,32,4,16,16,8,1,1,1
No binary
Benchmarking solution: v2:16,32,4,16,16,4,1,1,1
No binary
Benchmarking solution: v2:32,32,8,16,16,8,1,1,1
No binary
Benchmarking solution: v2:32,32,4,16,16,4,1,1,1
No binary
Benchmarking solution: v2:16,64,8,16,16,8,1,1,1
No binary
Benchmarking solution: v2:16,64,4,16,16,8,1,1,1
No binary
Benchmarking solution: v2:32,64,4,32,32,8,1,1,1
No binary
Benchmarking solution: v2:64,64,8,32,32,8,1,1,1
No binary
Benchmarking solution: v2:64,64,8,32,32,4,1,1,1
No binary
Benchmarking solution: v2:64,64,4,32,32,8,1,1,1
No binary
Benchmarking solution: v2:32,128,4,32,32,8,1,1,1
No binary
Benchmarking solution: v2:64,128,2,64,64,8,1,1,1
No binary
Benchmarking solution: v2:128,128,4,64,64,8,1,1,1
No binary
Benchmarking solution: v2:128,128,8,64,64,1,1,1,1
No binary
Benchmarking solution: v2:128,128,8,64,64,4,1,1,1
No binary
Benchmarking solution: v2:128,128,4,64,64,8,1,1,1
No binary
Benchmarking solution: v2:128,128,4,64,64,4,1,1,1
No binary
Benchmarking solution: v2:128,128,2,64,64,8,1,1,1
No binary
Benchmarking solution: v2:64,256,2,64,64,8,1,1,1
No binary
Fastest solution: v2:16,16,4,16,16,8,1,1,1

MIGRAPHX_TRACE_BENCHMARKING=5

Running [ MIGraphX Version: 2.10.0.cc7c28016 ]: ./bin/migraphx-driver compile --onnx /home/ravil/work/migraphx/models/gpt2-10.onnx --batch 1 --fp16
Compiling ...
Reading: /home/ravil/work/migraphx/models/gpt2-10.onnx
Exception in gpu::mlir_op: /home/ravil/work/migraphx/migraphx/src/targets/gpu/mlir.cpp:752: run_backend_pipeline: MLIR backend compilation failed: Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %99 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %99 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: 'llvm.call' op 'malloc' does not reference a symbol in the current scope
Note: see current operation: %851 = "llvm.call"(%850) <{CConv = #llvm.cconv<ccc>, callee = @malloc, callee_type = !llvm.func<ptr (i32)>, fastmathFlags = #llvm.fastmath<none>}> : (i32) -> !llvm.ptr

Exception in gpu::mlir_op: /home/ravil/work/migraphx/migraphx/src/targets/gpu/mlir.cpp:752: run_backend_pipeline: MLIR backend compilation failed: Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %182 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %182 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: 'llvm.call' op 'malloc' does not reference a symbol in the current scope
Note: see current operation: %1002 = "llvm.call"(%1001) <{CConv = #llvm.cconv<ccc>, callee = @malloc, callee_type = !llvm.func<ptr (i32)>, fastmathFlags = #llvm.fastmath<none>}> : (i32) -> !llvm.ptr

Exception in gpu::mlir_op: /home/ravil/work/migraphx/migraphx/src/targets/gpu/mlir.cpp:752: run_backend_pipeline: MLIR backend compilation failed: Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %175 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %175 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: 'llvm.call' op 'malloc' does not reference a symbol in the current scope
Note: see current operation: %887 = "llvm.call"(%886) <{CConv = #llvm.cconv<ccc>, callee = @malloc, callee_type = !llvm.func<ptr (i32)>, fastmathFlags = #llvm.fastmath<none>}> : (i32) -> !llvm.ptr

Exception in gpu::mlir_op: /home/ravil/work/migraphx/migraphx/src/targets/gpu/mlir.cpp:752: run_backend_pipeline: MLIR backend compilation failed: Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %131 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %131 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: 'llvm.call' op 'malloc' does not reference a symbol in the current scope
Note: see current operation: %896 = "llvm.call"(%895) <{CConv = #llvm.cconv<ccc>, callee = @malloc, callee_type = !llvm.func<ptr (i32)>, fastmathFlags = #llvm.fastmath<none>}> : (i32) -> !llvm.ptr

Exception in gpu::mlir_op: /home/ravil/work/migraphx/migraphx/src/targets/gpu/mlir.cpp:752: run_backend_pipeline: MLIR backend compilation failed: Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %183 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %183 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: 'llvm.call' op 'malloc' does not reference a symbol in the current scope
Note: see current operation: %1290 = "llvm.call"(%1289) <{CConv = #llvm.cconv<ccc>, callee = @malloc, callee_type = !llvm.func<ptr (i32)>, fastmathFlags = #llvm.fastmath<none>}> : (i32) -> !llvm.ptr

Exception in gpu::mlir_op: /home/ravil/work/migraphx/migraphx/src/targets/gpu/mlir.cpp:752: run_backend_pipeline: MLIR backend compilation failed: Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %182 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %182 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: 'llvm.call' op 'malloc' does not reference a symbol in the current scope
Note: see current operation: %949 = "llvm.call"(%948) <{CConv = #llvm.cconv<ccc>, callee = @malloc, callee_type = !llvm.func<ptr (i32)>, fastmathFlags = #llvm.fastmath<none>}> : (i32) -> !llvm.ptr

Exception in gpu::mlir_op: /home/ravil/work/migraphx/migraphx/src/targets/gpu/mlir.cpp:752: run_backend_pipeline: MLIR backend compilation failed: Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %134 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %134 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: 'llvm.call' op 'malloc' does not reference a symbol in the current scope
Note: see current operation: %784 = "llvm.call"(%783) <{CConv = #llvm.cconv<ccc>, callee = @malloc, callee_type = !llvm.func<ptr (i32)>, fastmathFlags = #llvm.fastmath<none>}> : (i32) -> !llvm.ptr

Exception in gpu::mlir_op: /home/ravil/work/migraphx/migraphx/src/targets/gpu/mlir.cpp:752: run_backend_pipeline: MLIR backend compilation failed: Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %135 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %135 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: 'llvm.call' op 'malloc' does not reference a symbol in the current scope
Note: see current operation: %1120 = "llvm.call"(%1119) <{CConv = #llvm.cconv<ccc>, callee = @malloc, callee_type = !llvm.func<ptr (i32)>, fastmathFlags = #llvm.fastmath<none>}> : (i32) -> !llvm.ptr

Exception in gpu::mlir_op: /home/ravil/work/migraphx/migraphx/src/targets/gpu/mlir.cpp:752: run_backend_pipeline: MLIR backend compilation failed: Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %131 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %131 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: 'llvm.call' op 'malloc' does not reference a symbol in the current scope
Note: see current operation: %888 = "llvm.call"(%887) <{CConv = #llvm.cconv<ccc>, callee = @malloc, callee_type = !llvm.func<ptr (i32)>, fastmathFlags = #llvm.fastmath<none>}> : (i32) -> !llvm.ptr

Exception in gpu::mlir_op: /home/ravil/work/migraphx/migraphx/src/targets/gpu/mlir.cpp:752: run_backend_pipeline: MLIR backend compilation failed: Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %99 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %99 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: 'llvm.call' op 'malloc' does not reference a symbol in the current scope
Note: see current operation: %845 = "llvm.call"(%844) <{CConv = #llvm.cconv<ccc>, callee = @malloc, callee_type = !llvm.func<ptr (i32)>, fastmathFlags = #llvm.fastmath<none>}> : (i32) -> !llvm.ptr

Exception in gpu::mlir_op: /home/ravil/work/migraphx/migraphx/src/targets/gpu/mlir.cpp:752: run_backend_pipeline: MLIR backend compilation failed: Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %183 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %183 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: 'llvm.call' op 'malloc' does not reference a symbol in the current scope
Note: see current operation: %1161 = "llvm.call"(%1160) <{CConv = #llvm.cconv<ccc>, callee = @malloc, callee_type = !llvm.func<ptr (i32)>, fastmathFlags = #llvm.fastmath<none>}> : (i32) -> !llvm.ptr

Exception in gpu::mlir_op: /home/ravil/work/migraphx/migraphx/src/targets/gpu/mlir.cpp:752: run_backend_pipeline: MLIR backend compilation failed: Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %135 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %135 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: 'llvm.call' op 'malloc' does not reference a symbol in the current scope
Note: see current operation: %1100 = "llvm.call"(%1099) <{CConv = #llvm.cconv<ccc>, callee = @malloc, callee_type = !llvm.func<ptr (i32)>, fastmathFlags = #llvm.fastmath<none>}> : (i32) -> !llvm.ptr

Exception in gpu::mlir_op: /home/ravil/work/migraphx/migraphx/src/targets/gpu/mlir.cpp:752: run_backend_pipeline: MLIR backend compilation failed: Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %134 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %134 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: 'llvm.call' op 'malloc' does not reference a symbol in the current scope
Note: see current operation: %725 = "llvm.call"(%724) <{CConv = #llvm.cconv<ccc>, callee = @malloc, callee_type = !llvm.func<ptr (i32)>, fastmathFlags = #llvm.fastmath<none>}> : (i32) -> !llvm.ptr

Exception in gpu::mlir_op: /home/ravil/work/migraphx/migraphx/src/targets/gpu/mlir.cpp:752: run_backend_pipeline: MLIR backend compilation failed: Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %183 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %183 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: 'llvm.call' op 'malloc' does not reference a symbol in the current scope
Note: see current operation: %1203 = "llvm.call"(%1202) <{CConv = #llvm.cconv<ccc>, callee = @malloc, callee_type = !llvm.func<ptr (i32)>, fastmathFlags = #llvm.fastmath<none>}> : (i32) -> !llvm.ptr

Exception in gpu::mlir_op: /home/ravil/work/migraphx/migraphx/src/targets/gpu/mlir.cpp:752: run_backend_pipeline: MLIR backend compilation failed: Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %175 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %175 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: 'llvm.call' op 'malloc' does not reference a symbol in the current scope
Note: see current operation: %1027 = "llvm.call"(%1026) <{CConv = #llvm.cconv<ccc>, callee = @malloc, callee_type = !llvm.func<ptr (i32)>, fastmathFlags = #llvm.fastmath<none>}> : (i32) -> !llvm.ptr

Exception in gpu::mlir_op: /home/ravil/work/migraphx/migraphx/src/targets/gpu/mlir.cpp:752: run_backend_pipeline: MLIR backend compilation failed: Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %183 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %183 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: 'llvm.call' op 'malloc' does not reference a symbol in the current scope
Note: see current operation: %1141 = "llvm.call"(%1140) <{CConv = #llvm.cconv<ccc>, callee = @malloc, callee_type = !llvm.func<ptr (i32)>, fastmathFlags = #llvm.fastmath<none>}> : (i32) -> !llvm.ptr

Exception in gpu::mlir_op: /home/ravil/work/migraphx/migraphx/src/targets/gpu/mlir.cpp:752: run_backend_pipeline: MLIR backend compilation failed: Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %100 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %100 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: 'llvm.call' op 'malloc' does not reference a symbol in the current scope
Note: see current operation: %994 = "llvm.call"(%993) <{CConv = #llvm.cconv<ccc>, callee = @malloc, callee_type = !llvm.func<ptr (i32)>, fastmathFlags = #llvm.fastmath<none>}> : (i32) -> !llvm.ptr

Exception in gpu::mlir_op: /home/ravil/work/migraphx/migraphx/src/targets/gpu/mlir.cpp:752: run_backend_pipeline: MLIR backend compilation failed: Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %135 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %135 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: 'llvm.call' op 'malloc' does not reference a symbol in the current scope
Note: see current operation: %1162 = "llvm.call"(%1161) <{CConv = #llvm.cconv<ccc>, callee = @malloc, callee_type = !llvm.func<ptr (i32)>, fastmathFlags = #llvm.fastmath<none>}> : (i32) -> !llvm.ptr

Exception in gpu::mlir_op: /home/ravil/work/migraphx/migraphx/src/targets/gpu/mlir.cpp:752: run_backend_pipeline: MLIR backend compilation failed: Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %183 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %183 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: 'llvm.call' op 'malloc' does not reference a symbol in the current scope
Note: see current operation: %1290 = "llvm.call"(%1289) <{CConv = #llvm.cconv<ccc>, callee = @malloc, callee_type = !llvm.func<ptr (i32)>, fastmathFlags = #llvm.fastmath<none>}> : (i32) -> !llvm.ptr

Exception in gpu::mlir_op: /home/ravil/work/migraphx/migraphx/src/targets/gpu/mlir.cpp:752: run_backend_pipeline: MLIR backend compilation failed: Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %182 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: vectorization analysis found intermediate allocation but isn't following fusions, results may be incorrect

Note: see current operation: %182 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array<i32: 0, 0>}> : () -> memref<12xf16>
Error: 'llvm.call' op 'malloc' does not reference a symbol in the current scope
Note: see current operation: %1045 = "llvm.call"(%1044) <{CConv = #llvm.cconv<ccc>, callee = @malloc, callee_type = !llvm.func<ptr (i32)>, fastmathFlags = #llvm.fastmath<none>}> : (i32) -> !llvm.ptr

Benchmarking gpu::mlir_op: 20 configs
Problem: gfx908:sramecc+:xnack- 120     -t f16 -out_datatype f16 -transA false -transB false -g 1 -m 1 -n 2304 -k 768
Benchmarking solution: v2:16,64,8,16,16,8,1,1,1
0.0707917ms
Benchmarking solution: v2:16,64,4,16,16,8,1,1,1
0.0641358ms
Benchmarking solution: v2:16,32,4,16,16,8,1,1,1
0.0593597ms
Benchmarking solution: v2:16,32,4,16,16,4,1,1,1
0.0626238ms
Benchmarking solution: v2:16,16,4,16,16,8,1,1,1
0.0621037ms
Benchmarking solution: v2:32,128,4,32,32,8,1,1,1
0.0648798ms
Benchmarking solution: v2:32,64,4,32,32,8,1,1,1
0.0611038ms
Benchmarking solution: v2:32,32,8,16,16,8,1,1,1
0.0653998ms
Benchmarking solution: v2:32,32,4,16,16,4,1,1,1
0.0579837ms
Benchmarking solution: v2:64,256,2,64,64,8,1,1,1
0.0586798ms
Benchmarking solution: v2:64,128,2,64,64,8,1,1,1
0.0601438ms
Benchmarking solution: v2:64,64,8,32,32,8,1,1,1
0.0639998ms
Benchmarking solution: v2:64,64,8,32,32,4,1,1,1
0.0625597ms
Benchmarking solution: v2:64,64,4,32,32,8,1,1,1
0.0613358ms
Benchmarking solution: v2:128,128,4,64,64,8,1,1,1
0.0579517ms
Benchmarking solution: v2:128,128,8,64,64,1,1,1,1
0.0651438ms
Benchmarking solution: v2:128,128,8,64,64,4,1,1,1
0.0691197ms
Benchmarking solution: v2:128,128,4,64,64,8,1,1,1
0.114216ms
Benchmarking solution: v2:128,128,4,64,64,4,1,1,1
0.0686958ms
Benchmarking solution: v2:128,128,2,64,64,8,1,1,1
0.0682397ms
Fastest solution: v2:128,128,4,64,64,8,1,1,1
Benchmarking gpu::mlir_op: 20 configs
Problem: gfx908:sramecc+:xnack- 120     -t f16 -out_datatype f16 -transA false -transB false -g 12 -m 1 -n 1 -k 64
Benchmarking solution: v2:16,16,4,16,16,8,1,1,1
No binary
Benchmarking solution: v2:16,32,4,16,16,8,1,1,1
No binary
Benchmarking solution: v2:16,32,4,16,16,4,1,1,1
No binary
Benchmarking solution: v2:32,32,8,16,16,8,1,1,1
No binary
Benchmarking solution: v2:32,32,4,16,16,4,1,1,1
No binary
Benchmarking solution: v2:16,64,8,16,16,8,1,1,1
No binary
Benchmarking solution: v2:16,64,4,16,16,8,1,1,1
No binary
Benchmarking solution: v2:32,64,4,32,32,8,1,1,1
No binary
Benchmarking solution: v2:64,64,8,32,32,8,1,1,1
No binary
Benchmarking solution: v2:64,64,8,32,32,4,1,1,1
No binary
Benchmarking solution: v2:64,64,4,32,32,8,1,1,1
No binary
Benchmarking solution: v2:32,128,4,32,32,8,1,1,1
No binary
Benchmarking solution: v2:64,128,2,64,64,8,1,1,1
No binary
Benchmarking solution: v2:128,128,4,64,64,8,1,1,1
No binary
Benchmarking solution: v2:128,128,8,64,64,1,1,1,1
No binary
Benchmarking solution: v2:128,128,8,64,64,4,1,1,1
No binary
Benchmarking solution: v2:128,128,4,64,64,8,1,1,1
No binary
Benchmarking solution: v2:128,128,4,64,64,4,1,1,1
No binary
Benchmarking solution: v2:128,128,2,64,64,8,1,1,1
No binary
Benchmarking solution: v2:64,256,2,64,64,8,1,1,1
No binary
Fastest solution: v2:16,16,4,16,16,8,1,1,1
terminate called after throwing an instance of 'migraphx::version_2_10_0::exception'
  what():  /home/ravil/work/migraphx/migraphx/src/targets/gpu/compile_ops.cpp:222: benchmark: No valid tuned compilation for gpu::mlir_op with gfx908:sramecc+:xnack-   120     -t f16 -out_datatype f16 -transA false -transB false -g 12 -m 1 -n 1 -k 64

ravil-mobile commented 5 months ago

Hi @causten , can you try the following PR: https://github.com/ROCm/rocMLIR/pull/1512

ROCm / rocMLIR