Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

clang crashes on riscv64 #49715

Open Quuxplusone opened 3 years ago

Quuxplusone commented 3 years ago
Bugzilla Link PR50746
Status NEW
Importance P normal
Reported by Serge Vakulenko (serge.vakulenko@gmail.com)
Reported on 2021-06-17 00:52:42 -0700
Last modified on 2021-06-28 23:02:58 -0700
Version 11.0
Hardware Other Linux
CC asb@lowrisc.org, craig.topper@gmail.com, dimitry@andric.com, efriedma@quicinc.com, fraser@codeplay.com, llvm-bugs@lists.llvm.org, luismarques@lowrisc.org, neeilans@live.com, richard-llvm@metafoo.co.uk
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
I use Debian 11 installed on RISC-V platform: Nezha board with Allwinner D1
processor.

I installed clang as usual: "sudo apt install clang". The version is 1:11.0-
51+nmu5. The source of the packages is http://ftp.ports.debian.org/debian-
ports/.

When I run clang from command line without parameters, it crashes with message:

$ clang
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash
backtrace, preprocessed source, and associated run script.
Stack dump:
0.  Program arguments: clang
1.  Compilation construction
/usr/lib/riscv64-linux-gnu/libLLVM-11.so.1(_ZN4llvm3sys15PrintStackTraceERNS_11raw_ostreamE+0x28)[0x3fec874c08]
Illegal instruction

Other compilers seem to work fine. I checked gcc, rustc, go.

Information about the system:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:    11
Codename:   bullseye

$ uname -a
Linux nezha 5.4.61 #68 PREEMPT Tue Jun 1 04:18:22 UTC 2021 riscv64 GNU/Linux

$ /usr/sbin/hwinfo --short
cpu:
                       rv64imafdcvu
keyboard:
  /dev/ttyS0           serial console
network:
  eth0                 ARM Ethernet controller
  wlan0                ARM Ethernet controller
                       Network controller
network interface:
  eth0                 Ethernet network interface
  lo                   Loopback network interface
  sit0                 Network Interface
  wlan0                WLAN network interface
disk:
  /dev/mmcblk0         Disk
partition:
  /dev/mmcblk0p1       Partition
  /dev/mmcblk0p2       Partition
  /dev/mmcblk0p3       Partition
  /dev/mmcblk0p4       Partition
  /dev/mmcblk0p5       Partition
  /dev/mmcblk0p6       Partition
  /dev/mmcblk0p7       Partition
  /dev/mmcblk0p8       Partition
hub:
                       Linux Foundation 2.0 root hub
                       Linux Foundation 1.1 root hub
memory:
                       Main Memory

Thanks,
--Serge
Quuxplusone commented 3 years ago

I think you should first report this with the Debian package maintainer(s). If you're able, can you build clang from source on this particular system, and see if it then also crashes in the same way?

Quuxplusone commented 3 years ago
Let's investigate with gdb.

$ gdb -q /usr/bin/clang-11
Reading symbols from /usr/bin/clang-11...
(No debugging symbols found in /usr/bin/clang-11)
(gdb) r
Starting program: /usr/bin/clang-11
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
0x0000003ff215b098 in ?? () from /usr/lib/riscv64-linux-gnu/libLLVM-11.so.1

Here is the instruction which caused exception:

(gdb) x/i 0x0000003ff215b098
=> 0x3ff215b098:    fence.tso

From RISC-V Instruction Set Manual: "The optional FENCE.TSO instruction is
encoded as a FENCE instruction with fm=1000, predecessor=RW, and successor=RW.
FENCE.TSO orders all load operations in its predecessor set before all memory
operations in its successor set, and all store operations in its predecessor
set before all store operations in its successor set. This leaves non-AMO store
operations in the FENCE.TSO’s predecessor set unordered with non-AMO loads in
its successor set."

Note: "The _optional_ FENCE.TSO instruction".

This instruction is not supported by Allwinner D1 processor, that's why it
causes exception. So it's wrong for clang to generate it unconditionally. I'm
not sure how clang was built for Debian. Maybe this option was enabled somehow.
In this case it's a fault of Debian maintainers.
Quuxplusone commented 3 years ago

Thanks for the bug report.

It actually is correct for this instruction to be generated unconditionally, as the explanatory note in the ISA manual states, compliant RISC-V implementations should ignore values in the 'fm' field that are unrecognised, meaning it falls back to a full fence "The FENCE.TSO encoding was added as an optional extension to the original base FENCE instruction encoding. The base definition requires that implementations ignore any set bits and treat the FENCE as global, and so this is a backwards-compatible extension."

As you'll see in table A.6 in the ISA manual, fence.tso is part of the standard lowerings to map the C/C++ memory model to RISC-V. In cores that don't implement fence.tso specifically, this should just be a stronger fence than necessary.

We can add a flag to enable a different lowering, but due to this bug that core is going to have problems with code not compiled specifically for it.

Quuxplusone commented 3 years ago

Thank you for explanations. I understand, that it's probably a mistake of the Allwinner D1 chip designers, to treat FENCE.TSO as undefined opcode instead of just a regular FENCE. It would be a stricter semantics than needed, but it would work. It's the first version of the chip. Hopefully this issue will be fixed or at least documented in the errata sheet.

Anyway, I managed to build clang-13 from sources (https://github.com/llvm/llvm-project.git) and it works pretty well. Seems like the FENCE.TSO issue has been resolved somehow. I build clang directly on the RISC-V board itself, under Debian. I took a few days to finish, as a single-core 1GHz processor with 1GB RAM is clearly not enough for so huge source base, but it worked. I don't see any 'Illegal instruction' exceptions neigher from binaries I compile with clang, nor from other Linux software.

So we may probably consider the FENCE.TSO resolved, I guess, as the latest clang works fine.