Support for legalization of <2 x s32>

newling commented 1 week ago

The error observed:

LLVM ERROR: unable to legalize instruction: %512:_(<2 x s32>) = G_MUL %378:_, %390:_ (in function: core_0_2)

Is this instruction multiplying 2 vectors containing 2 32-bit signed integers together?

Just checking, as my understanding is that this is not a supported operation on AIE.

Thanks

UPDATE: reproducer attached in comment below: https://github.com/Xilinx/llvm-aie/issues/102#issuecomment-2195908850

konstantinschwarz commented 1 week ago

Hi @newling, yes this instruction is multiplying two integer vectors <2 x s32>, and isn't supported natively. It can be implemented by extracting the scalar elements and executing the MUL instruction twice. That legalization is currently missing in our AIELegalizerInfo.cpp

newling commented 1 week ago

Thanks Konstantin.

newling commented 1 week ago

reproducers.tar.gz

The above contains files to reproduce.

To observe the error, run

./peano/install/bin/llc input_after.opt.ll -O2 --march=aie2 --function-sections --filetype=obj

where input_after.opt.ll is a file in the attached zip file (~300 line file). The error is:

LLVM ERROR: unable to legalize instruction: %512:_(<2 x s32>) = G_MUL %378:_, %390:_ (in function: core_0_2)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: ../../../peano_github/install/bin/llc results_dir_tmp/module_matmul_8x32_16xi32__dispatch_0_amdaie_xclbin_fb/input.opt.ll -O2 --march=aie2 --function-sections --filetype=obj

The file input_before.opt.ll is a very similar file, added a reference, for an example which compiles.

Background: these files are from a int32 transposed matmul. The difference between 'before' and 'after' is a minor change in how loops are unrolled.

newling commented 1 week ago

@konstantinschwarz I'm reopening if that's ok, to track support for this. Please close if you have other ways of tracking such things..

newling commented 1 week ago

Note that input_after.opt.ll in the reproducer zip file is created from input_after.ll as:

install/bin/opt -O2  --inline-threshold=10 -S input_after.ll   --disable-builtin=memset -o input_after.opt.ll

If the optimization level is lowered to 1, i.e. if input_after.opt.ll is created as

install/bin/opt -O1  --inline-threshold=10 -S input_after.ll   --disable-builtin=memset -o input_after.opt.ll

Then the subsequent call to

install/bin/llc  input_after.opt.ll -O2 --march=aie2 --function-sections --filetype=obj

Generates the .o file without a problem. So this problem is specific to running opt with -On for n > 1.

Is there some kind of auto-vectorization that is happening in opt with O2 which should not be, especially for i32 types?

@konstantinschwarz @MaheshRavishankar @jsetoain

konstantinschwarz commented 6 days ago

I'd need to look at the pass pipeline that is run with opt, but we are explicitly disabling the autovectorizer from the clang driver here: https://github.com/Xilinx/llvm-aie/blob/aie-public/clang/lib/Driver/ToolChains/AIE.cpp#L96

I think @jsetoain was running into a similar issue with opt before

jsetoain commented 6 days ago

I'd need to look at the pass pipeline that is run with opt, but we are explicitly disabling the autovectorizer from the clang driver here: https://github.com/Xilinx/llvm-aie/blob/aie-public/clang/lib/Driver/ToolChains/AIE.cpp#L96

I think @jsetoain was running into a similar issue with opt before

--Notice that it's llc doing the autovectorization in this case. IREE also runs opt, but (correct me if I am wrong, @newling ) that one did not vectorize the IR.-- Never mind, I misread.

The only issue with opt that I recall was that it wasn't propagating alignment assumptions, and that was user error.

newling commented 6 days ago

It's opt which is doing the vectorization which we don't want. See screenshot of why I say this, below:

i.e. opt with -O2 introduces the vector mul op.

(where I did export OPT=path/to/my/install/bin/opt)

konstantinschwarz commented 6 days ago

It's opt which is doing the vectorization which we don't want. See screenshot of why I say this, below:

i.e. opt with -O2 introduces the vector mul op.

(where I did export OPT=path/to/my/install/bin/opt)

If you are calling opt manually, you have to pass --vectorize-loops=false --vectorize-slp=false to disable the vectorizers. We can extend our legalization in the backend to scalarize illegal vector types/operations, but ultimately we probably want to have a better cost model for those auto-vectorizers

Xilinx / llvm-aie

Support for legalization of <2 x s32> #102