apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.81k stars 3.48k forks source link

[Bug] Unsupported CPU on SpacemiT K1 Otca-core X60(RV64GCVB),RVA22, #17508

Closed JieGH closed 1 week ago

JieGH commented 2 weeks ago

Expected Behavior

After building TVM 0.18.0 with LLVM 19.1.3, I expect TVM to generate RISC-V compatible code that executes without errors related to unsupported CPU types. The build should allow the execution of a basic TVM Python example on a Banana Pi K1 board, with the riscv64-linux-gnu target specified in the configuration.

Actual Behavior

Upon running a simple TVM example with LLVM 19.1.3 and TVM 0.18.0 on the Banana Pi K1, I encounter the following error message:

Unsupported CPU type!
UNREACHABLE executed at /home/jlei/llvm-project/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:1080!

In the TVM logs, there is also a warning that native vector bits are set to 128 for RISC-V, which could be relevant to the issue. The error persists despite multiple rebuilds of both LLVM and TVM, with adjusted configurations and target-specific flags to ensure compatibility with the RISC-V architecture on this board.

The error appears to stem from LLVM’s RuntimeDyldELF.cpp file, and recent threads, such as LLVM Issue #58652 and Halide Issue #7078, mention related problems that were resolved in newer LLVM releases, motivating my decision to upgrade from LLVM 15.0.7 to 19.1.3.

Environment

•   Operating System: Banana Pi K1 OS (version 1.X, latest)
•   LLVM Version: 19.1.3 (Default target: riscv64-linux-gnu; Host CPU: generic-rv64)
•   TVM Version: 0.18.0
•   Target Triple Configuration in TVM: "llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64"
•   Architecture Flags: -march=rv64gc -mabi=lp64d
•   Other Configuration Flags:
•   USE_LLVM set to "llvm-config --ignore-libllvm --link-static"
•   GPU backends like CUDA, Vulkan, and OpenCL disabled.
•   Set USE_TVM_RUNTIME ON, USE_PROFILER ON, USE_GRAPH_RUNTIME ON.
•   Profiling, graph runtime, and relevant libraries enabled; unnecessary libraries like MKL and NNPACK disabled.
•   Builds attempted with both RelWithDebInfo and Release build types.

Steps to Reproduce

1.  Compile LLVM 19.1.3 with the following configurations:
•   Ensure the riscv64-linux-gnu target is specified explicitly during the build.
•   Build LLVM with optimized settings, assertions enabled, and set default and target-specific flags for RISC-V compatibility.
2.  Configure and build TVM 0.18.0:
•   Specify the target triple as riscv64-linux-gnu.
•   Set architecture flags for -march=rv64gc -mabi=lp64d.
•   Disable unnecessary backends and enable LLVM and RISC-V-specific configurations.
•   Ensure no additional RISC-V flags are set in the LLVM configuration to isolate any unsupported flag issues.
3.  Run a simple TVM Python example (like matrix multiplication or a basic compute test) on the Banana Pi K1 with the above setup to trigger the CPU error.

Additional Notes and Troubleshooting

•   I have attempted multiple builds of LLVM and TVM with minimal changes each time to pinpoint the issue.
•   Cross-referencing with related issues, like [LLVM Issue #58652](https://github.com/llvm/llvm-project/issues/58652), suggests this might be linked to incomplete support for specific RISC-V targets or configurations.
•   Despite the “Unsupported CPU” error, Python finishes the TVM script execution, but the generated LLVM code fails to execute.
•   Notably, the error does not occur when using an older LLVM version (15.0.7), although it cannot produce LLVM code properly for the required RISC-V target.
cbalint13 commented 1 week ago

@JieGH ,

I look into this, let's fix it.

In the meanwhile, latest llvm is a must to have for RISC-V targets (18.x , 19.x is fine), but also could please enable orcjit executor (which is experimental, under -jit=orcjit flag) inside TVM by defining your target like ones below:

JieGH commented 1 week ago

Hi @cbalint13 , thanks for the miracle you bring. It works now. I attached the target I used here and the version of my time and llvm. I will test the TVM with a more extensive test later. Also, I did realize the issue is coming from a flag used for LLVM, yet what I need is a flag.

Privious error message:

Unsupported CPU type!
UNREACHABLE executed at /home/USER/llvm-project/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:1080!
Aborted

Solution: enable LLVM’s Orc JIT (On-Request Compilation) engine

target = "llvm -jit=orcjit -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mattr=+a,+c,+d,+f,+m"
Target kind: llvm
Target options: {"mtriple": "riscv64-linux-gnu"}
LLVM config path: /usr/local/bin/llvm-config

llc --version output:
 LLVM (http://llvm.org/):
  LLVM version 19.1.3
  Optimized build with assertions.
  Default target: riscv64-linux-gnu
  Host CPU: generic-rv64

  Registered Targets:
    riscv32 - 32-bit RISC-V
    riscv64 - 64-bit RISC-V
cbalint13 commented 1 week ago

Hi @JieGH ,

Hi @cbalint13 , thanks for the miracle you bring. It works now. I attached the target I used here and the version of my time and llvm. I will test the TVM with a more extensive test later. Also, I did realize the issue is coming from a flag used for LLVM, yet what I need is a flag.

Thanks a lot for your time and the feedback !

It is not a miracle, but I will open a PR to propose promotion of orcjit as TVM default llvm executor, instead of actual deprecated mcjit .

Please let me know any of your performance test, you are welcome to report it here, I am personally interested in the riscv targets. On my personal task list there is a RVV tensorization proposal for metaschedule/autoschedule, a preliminary integration with benchmarks for v0.7.1 and v1.0 RVV variants are here: https://github.com/cbalint13/rvv-kernels