support --fix-v4bx - Githubissues

Quuxplusone commented 3 years ago


Bugzilla Link	PR51422
Status	NEW
Importance	P enhancement
Reported by	Nick Desaulniers (ndesaulniers@google.com)
Reported on	2021-08-09 13:29:34 -0700
Last modified on	2021-08-11 08:21:39 -0700
Version	unspecified
Hardware	PC All
CC	arnd@linaro.org, efriedma@quicinc.com, i@maskray.me, llvm-bugs@lists.llvm.org, natechancellor@gmail.com, smithp352@googlemail.com
Fixed by commit(s)
Attachments
Blocks	PR4068
Blocked by
See also

In order to link armv4 Linux kernel images with LLD, it's required that LLD support --fix-v4bx, otherwise we get the warning:

ld.lld: warning: lld uses blx instruction, no object with architecture supporting feature detected

IIUC, there's even a relocation type for this: R_ARM_V4BX.

https://github.com/ClangBuiltLinux/linux/issues/964

Quuxplusone commented 3 years ago

There's two separate issues here:

1. Supporting armv4t.
2. Supporting armv4 without thumb.

Supporting armv4t requires some modifications to the relocation handling for
calls; that's what the warning is about.

Supporting armv4 without thumb means we need to use a different sequence for
function returns.  Historically, this is a compiler thing, not a linker thing,
but R_ARM_V4BX is a trick to allow writing code that works for both armv4t and
armv4-without-thumb targets. --fix-v4bx essentially converts arm-mode armv4t
code to armv4.

If you're not using --fix-v4bx, the linker doesn't really care about the
difference between armv4t and armv4; it doesn't generate function returns,
anyway.

Quuxplusone commented 3 years ago

(The documentation says the support is for >=v6.)

Isn't armv4 so obscure that it it quite irrelevant now?
Even if it isn't so obscure, are these users happy with their current toolchain
and don't consider a switch?

Quuxplusone commented 3 years ago

armv5t has both the register and immediate forms of blx, I think.

I think the issue here is there are some legacy hardware enthusiasts who build Linux for very old hardware, like https://en.wikipedia.org/wiki/Risc_PC , so legacy configs targeting armv4 persist in the kernel source tree, and randconfig stumbles over them.

Quuxplusone commented 3 years ago

There are three classes of ARMv4 (not T) users that I can see from a kernel
perspective:

- RiscPC/sa110 has only one actual user (rmk) I'm aware of, and it has the
added problem that the hardware does not support 32-bit memory accesses, so
kernels are actually built for ARMv3 despite this being an ARMv4 processor. gcc-
9 removed ARMv3 support a few years ago, and I see no reason to add this to
clang. As soon as gcc-9 or higher is required for building kernels, this
platform will be removed.

- There are a handful of hobbyist users of legacy StrongARM machines that run
ARMv4 kernels on SA1100 handhelds or Netwinder. We will probably discuss
removing those as well when time is up for RiscPC, but they could remain for a
few years longer.

- There are two remaining embedded platforms based on Faraday FA526 cores that
are deployed commercially in settings that require kernel updates: Cortina
Gemini and MOXA Art. The hardware is old, but the kernel ports are well written
and  maintained, they will probably outlive a number of ARMv4T and ARMv5E
platforms we currently support. It would be nice to be able to build kernels
for these using clang.

For reference, we are still adding support for ARMv5E (ARM926E) SoC platforms,
including SoCs designed as recently as 2020 (Microchip SAM9X60).
For ARMv4T, we currently support AT91RM9200, S3C24xx, i.MX1, EP72xx, EP93xx,
OMAP1, and ARM Integrator based on ARM920T and ARM710T. Debian Buster (released
in 2019) dropped support for these, but there is still some work going into the
kernel for at least ep93xx, omap1 and s3c24xx, similar to the ARMv4/FA526
platforms.

Quuxplusone commented 3 years ago

On the LLD side I originally implemented v7-A and above then later added v5 as
there was at least one person had some need for it, I think on the BSD side. It
probably wouldn't be a huge amount of extra code and tests to support v4t but
up until now there hasn't been anyone needing it in LLD. The assumption was
that older hardware would already have toolchains they could use, and we could
avoid extra complexity in the code.

For v4t support in lld, there are a few things we'd need to do:
* We'd need a new v4t thunks as ldr pc, [address] cannot change state in v4t.
The sequence becomes:
ldr ip, [address]
bx ip
* As there is no BLX there is no Thunk sharing between Arm and Thumb state. The
v4t Thunks isCompatibleWith would need to include only relocations from the
same state.
* ARM::needsThunk() would need a case for v4t that accounts for the lack of a
BLX. I think it would be something like (pseudo code):
return isV4t || !inBranchRange(type, branchAddr, dst + a);

If the Thunks code has done its job properly then the relocation handling code
should never be presented with an opportunity to use a BLX.

On the linker side R_ARM_V4BX is trivial to implement (just write Arm state mov
pc, lr encoding to the location), but as Eli points out the compiler has to
generate the relocation when targeting Arm v4.

If there is a need for support then it may be worth approaching Linaro TCWG. I
could do it, but I've not got a lot of spare time so it might take a few weeks
rather than a few days.

Quuxplusone commented 3 years ago

(In reply to Peter Smith from comment #5)
> * As there is no BLX there is no Thunk sharing between Arm and Thumb state.
> The v4t Thunks isCompatibleWith would need to include only relocations from
> the same state.

The first thing any v4t Thumb thunk will do is switch to ARM mode. The
remaining code is the same, so it could be shared.  Not sure that's worth
doing, though.

Quuxplusone commented 3 years ago

Also, the kernel doesn't actually have any thumb code, so it doesn't actually need v4t thunks, I think. It would be enough to just suppress the warning if there aren't any Thumb symbols.

Quuxplusone commented 3 years ago

> The first thing any v4t Thumb thunk will do is switch to ARM mode. The
> remaining code is the same, so it could be shared.  Not sure that's worth
> doing, though.

It is possible, although it would mean two entry points for thunk that we'd
need to select between per relocation type. My instinct would be to keep it
simple unless there was a real need to minimise the code-size.

> Also, the kernel doesn't actually have any thumb code, so it doesn't
> actually need v4t thunks, I think. It would be enough to just suppress the
> warning if there aren't any Thumb symbols.

That would be possible, although at the moment that would require LLD to scan
all the local symbols of every input object looking for the absence of $t.
Although only programs that could trigger the warning would need to run the
check. We can't use the BuildAttributes as objects built with -marm still have
Tag_THUMB_ISA_use=1 (Thumb instructions were permitted to be used)

Would a suppression of the warning be acceptable to the kernel folk? Or are
there v4t kernels that use Thumb code that need supporting?

Quuxplusone commented 3 years ago

(In reply to Peter Smith from comment #8)
>
> That would be possible, although at the moment that would require LLD to
> scan all the local symbols of every input object looking for the absence of
> $t. Although only programs that could trigger the warning would need to run
> the check. We can't use the BuildAttributes as objects built with -marm
> still have Tag_THUMB_ISA_use=1 (Thumb instructions were permitted to be used)
>
> Would a suppression of the warning be acceptable to the kernel folk? Or are
> there v4t kernels that use Thumb code that need supporting?

I'm sure there is no thumb code in pre-v7 kernels that we need to link against.
Suppressing the warning would seem to do the trick for armv4t kernels, but if I
build for armv4, I still see 'bx' instructions in the thunks.

I checked the compiler output from a recent clang and I see that building for
armv4 does not generate any 'bx' instructions, so I don't think we actually
need the R_ARM_V4BX/--fix-v4bx trick any more to work around compiler-generated
instructions, but we still need to work around the linker adding them.

Could we just have a linker flag to ask for armv4 style thunks, and use them
for both v4 and v4t kernels?

Quuxplusone commented 3 years ago

It does look like (ARMInstrInfo.td MOVPCLR https://github.com/llvm/llvm-
project/blob/main/llvm/lib/Target/ARM/ARMInstrInfo.td#L2443) the return
sequence is predicated so clang/llvm will use mov pc, lr for -march=armv4 so in
theory no compiler work required.

We can tell if all objects are ArmV4 from the build attributes. Wouldn't need a
new command line option.

Looking at LLD I'd expect that the only thunks needed for an Arm v4 kernel
would be long range Arm to Arm. If --pic-veneer is set then these will have bx
lr in them. However if --pic-veneer isn't set I'd expect to see all the thunks
to be of the form:
ldr pc, [pc, #-4] ; l1
l1: .word <destination>

In theory these should be OK for v4. May be worth trying a build without --pic-
veneer (The kernel wouldn't be position independent though).

I think a new PI Thunk for v4 would look something like:
ldr ip, [pc, #-8] ; l1
add pc, ip, pc ; not interworking but OK for v4 as there is no Thumb
l1: .word destination -4 - l1 ; offset to destination

Although if we're adding thunks for armv4, it may just be worth adding v4t
thunks at the same time.

Quuxplusone commented 3 years ago

(In reply to Peter Smith from comment #10)
> It does look like (ARMInstrInfo.td MOVPCLR
> https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/ARM/
> ARMInstrInfo.td#L2443) the return sequence is predicated so clang/llvm will
> use mov pc, lr for -march=armv4 so in theory no compiler work required.

Maybe it was the 'blx mcount' when building with '-pg' that caused the
problems? I remember debugging into that in the past, but don't remember the
exactly problem and this seems fine now, it always uses 'bl mcount' here with
the clang version I see.

> We can tell if all objects are ArmV4 from the build attributes. Wouldn't
> need a new command line option.
>
> Looking at LLD I'd expect that the only thunks needed for an Arm v4 kernel
> would be long range Arm to Arm.

Correct

> If --pic-veneer is set then these will have
> bx lr in them. However if --pic-veneer isn't set I'd expect to see all the
> thunks to be of the form:
> ldr pc, [pc, #-4] ; l1
> l1: .word <destination>
>
> In theory these should be OK for v4. May be worth trying a build without
> --pic-veneer (The kernel wouldn't be position independent though).

Yes, that works. I see the --pic-veneer flag was added by Ard in
https://www.armlinux.org.uk/developer/patches/viewpatch.php?id=8323/1

to work around veneers that are inserted into the early-boot code that is run
before the MMU is enabled.

We can probably get away with this on actual ARMv4/v4t hardware since their
memory is too constrained to actually run a kernel that is large enough to
require veneers, but it seems a little wrong. In theory one can run a large
(allyesconfig style) kernel with ARMv4 enabled on an ARMv5 machine with a lot
of memory.

> I think a new PI Thunk for v4 would look something like:
> ldr ip, [pc, #-8] ; l1
> add pc, ip, pc ; not interworking but OK for v4 as there is no Thumb
> l1: .word destination -4 - l1 ; offset to destination
>
> Although if we're adding thunks for armv4, it may just be worth adding v4t
> thunks at the same time.

That would be nice for user space, but in the kernel we don't really care about
the distinction. I suggested a command line flag since we could then just
unconditionally pick the v4 veneers for linking a v4/v4t kernel and avoid both
the bx instructions and the warning.

Quuxplusone / LLVMBugzillaTest

support --fix-v4bx #50389