clang arm assembler does not support pseudo-instruction 'adrl' - Githubissues

Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

clang arm assembler does not support pseudo-instruction 'adrl' #24349

Open Quuxplusone opened 9 years ago

Quuxplusone commented 9 years ago


Bugzilla Link	PR24350
Status	REOPENED
Importance	P normal
Reported by	Han Shen (shenhan@google.com)
Reported on	2015-08-04 01:14:45 -0700
Last modified on	2020-09-02 11:55:55 -0700
Version	trunk
Hardware	PC All
CC	arnaud.degrandmaison@arm.com, bero@linaro.org, caij2003@gmail.com, james@jamesmolloy.co.uk, kristof.beyls@gmail.com, llozano@chromium.org, llvm-bugs@lists.llvm.org, ndesaulniers@google.com, rengolin@gmail.com, shenhan@google.com, smithp352@googlemail.com, srhines@google.com
Fixed by commit(s)
Attachments	`adr-invalid.s` (155 bytes, application/octet-stream) `adrl.s` (447 bytes, application/octet-stream) `arm-insn-vector.patch` (3529 bytes, text/plain) `sha256-armv4.pl` (17879 bytes, application/x-perl) `mdctARM.s` (33089 bytes, application/octet-stream)
Blocks	PR4068, PR24345
Blocked by
See also

"adrl rd, a_label" calculates a_label's address using pc-relative addressing, which means the section containing the insn and a_label can be put to any address. Without the support of this pseudo insn, it's more cumbersome to do this. "MOV32" or "LDR rd,=a_label" doesn't work in this scenario because both of these use absolute addressing.

Quuxplusone commented 9 years ago

FYI - we had a patch to disable adrl in chromeos openssl.

The CL is here -
https://chromium-review.googlesource.com/#/c/290457/

Quuxplusone commented 8 years ago

Can you provide an example assembly file and a command line that you use to reproduce?

Here's the documentation about ADRL:

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489i/Cacecdga.html

Quuxplusone commented 8 years ago

Attached adr-invalid.s (155 bytes, application/octet-stream): first test case

Quuxplusone commented 8 years ago

Attached adrl.s (447 bytes, application/octet-stream): second test case

Quuxplusone commented 8 years ago

I just attached two test case .s files. Use the command:

$ llvm-mc -triple=armv7-linux-gnu <filename>

$ ./bin/llvm-mc -triple=armv7-linux-gnu ~/adr-invalid.s
    .text

start:
    adr r0, var
    adr r0, undefinedvar
/usr/local/google/home/cmtice/adr-invalid.s:5:6: error: invalid operand for
instruction
        adrl    r1, var
            ^
/usr/local/google/home/cmtice/adr-invalid.s:6:6: error: invalid operand for
instruction
        adrl    r1, undefinedvar
            ^

    .data

    .globl  var
var:
    .long   0
$
$ ./bin/llvm-mc -triple=armv7-linux-gnu ~/adrl.s
    .text

    .globl  foo
foo:
    .align  0
.Ltmp0:
    .zero   8192
.Ltmp1:
/usr/local/google/home/cmtice/adrl.s:9:14: error: invalid operand for
instruction
                adrl    r0, 1b
                    ^
/usr/local/google/home/cmtice/adrl.s:10:14: error: invalid operand for
instruction
                adrl    r0, 1f
                    ^
/usr/local/google/home/cmtice/adrl.s:11:14: error: invalid operand for
instruction
                adrl    r0, 2b
                    ^
/usr/local/google/home/cmtice/adrl.s:12:14: error: invalid operand for
instruction
                adrl    r0, 2f
                    ^
/usr/local/google/home/cmtice/adrl.s:13:10: error: invalid instruction
                adrEQl  r0, 2f
                ^
.Ltmp3:
/usr/local/google/home/cmtice/adrl.s:15:14: error: invalid operand for
instruction
                adrl    r0, foo
                    ^
/usr/local/google/home/cmtice/adrl.s:16:14: error: invalid operand for
instruction
                adrl    r0, X
                    ^
    .zero   8184
.Ltmp2:
    adr lr, X
    .zero   260

    .globl  X
/usr/local/google/home/cmtice/adrl.s:22:58: error: unexpected token at start of
statement
                .globl X        ;                                      \
                                                                       ^

X:
    .align  5

Quuxplusone commented 8 years ago

(In reply to comment #5)
> I just attached two test case .s files. Use the command:

Thanks Carolina!

The examples were great, but after a while investigating how that would be
actually implemented, it turns out it's not that simple. The assembler will
need a lot of convincing to expand two instructions from an asm input stream.

There are ways to do it, of course, but all of them are too big for such a
small and isolated feature. For that reason, I want to understand more why
can't you just use ADR+ADD/SUB or even with macros. Can you give me an example
where it would be far worse?

cheers,
--renato

PS: I see you're using "adrEQl". This is pre-unified syntax and is not
supported by LLVM (nor will be). Can you make sure you have ".syntax unified"
on all assembly files, and move the predicate to the end on all instructions
(ex. "adrlEQ")? This will make it easier for the future, when we sort out the
ADRL problems.

Quuxplusone commented 8 years ago

I am trying to compile code from someone else (OpenSSL, in the crypto part) which contains inline assembly, to optimize performance.

The inline assembly contains the adrl instruction.

The code compiles just fine with GCC (and the GNU as assembler). We would like to be able to use Clang/LLVM, but we need to be able to compile/link this third-party code.

I am working on trying to add this capability to LLVM, but since I am learning the compiler code as I go, it's taking me a bit of time.

The test cases I gave you came straight from the GNU assembler regression test suite.

Quuxplusone commented 8 years ago

(In reply to comment #7)
> I am trying to compile code from someone else (OpenSSL, in the crypto part)
> which contains inline assembly, to optimize performance.

I know. Most IAS bug reports we get are from third-party optimised code.

> The code compiles just fine with GCC (and the GNU as assembler).  We would
> like to be able to use Clang/LLVM, but we need to be able to compile/link
> this third-party code.

I understand, but "compiling like GAS" is not our goal. We aim to produce a
working assembler with modern and *documented* assembly constructs.

The ADRL case is borderline, because it's documented in ARMASM, but not in the
ARM manual, and GAS doesn't seem to support ADRL in Thumb2, either.

> I am working on trying to add this capability to LLVM, but since I am
> learning the compiler code as I go, it's taking me a bit of time.

No worries, it does take time. :)

I'm still working on the implementation, but it is not going to be pretty, and
that's why we're trying to avoid it.

If you could contact the original author and point out this bug report, I'm
hoping he/she could have a look at that particular sequence and try and use
something else.

At the very least, pre-UAL code will not be accepted by LLVM's ARM assembler,
so that'll block you further, even if we do add ADRL. In the end, contacting
the original author will be necessary to progress with IAL for OpenSSL.

> The test cases I gave you came straight from the GNU assembler regression
> test suite.

Right, then we may have a problem. GAS and GCC are GPLv3 and LLVM is not, so we
can't reuse GNU code as-is due to licensing problems. Can you send the snippet
in the OpenSSL source that you're having problems with, so we can deal with
that first?

I'll keep in mind the examples you gave already when creating a test for that,
but I can't re-use it.

Quuxplusone commented 8 years ago

After some investigation, I found a few problematic issues with implementing
this in the LLVM assembler:

1. The assembler parser has only limited access to the stream of instructions,
and the vast majority of the transformations are for canonicalisation (changing
encodings) rather than creating new instructions. For that reason, we'll have
to change how the process is done to allow "expandInstruction" to run before
"processInstruction". I've done that, doesn't disrupt much.

2. The immediate comes as an MCExpr with the label. We need to transform it
into a PC-relative symbol by creating a temporary symbol and adding it to the
immediate. I've done that, but the end result is a non-constant expression that
we can't know the real value until it's finally emitted by the MCStreamer
(because distances can change, especially in our case, that we're adding new
instructions, forward ranges can change for previously emitted ADRLs).

This second point will require that we add a multi-instruction fixup, because
the immediate can be bigger than any possible encoding (the whole point of
ADRL), so the relative result of the final expression will have to be split in
two instructions, and encoded differently (depending on which opcode we use for
each).

That calculation cannot be done in the assembler parser (it's definitely not
the place), but it's not yet supported in the streamer because of the type of
non-constant multi-instruction fixups.

I'm not aware of any such fixup type, so unless there is precedent and the
streamer can cope with it, implementing ADRL would require us to ultimately
change how we stream instructions into asm and obj.

One way to do this is to keep the ADRL instruction valid with virtually any
immediate value plus a special fixup, and *always* add a NOP after it early on
in the assembler. This can be done with step 1 above. Later, the MCStreamer
would recognise the fixup and re-use the NOP as an extra instruction if needed.
Adding a NOP simplifies range calculations and is what's expected in case the
ADRL can turn into a single ADD/SUB.

Arnaud, James, do you guys know of anything like this being done elsewhere?

Quuxplusone commented 8 years ago

Attached arm-insn-vector.patch (3529 bytes, text/plain): patch for processInstruction to return multiple instructions

Quuxplusone commented 8 years ago

Attached sha256-armv4.pl (17879 bytes, application/x-perl): openssl file with 'adrl' instruction

Quuxplusone commented 8 years ago

Hi Caroline,

Thanks for the patch, it's similar to what I've done locally. But that's not enough to translate ADRL into ads and subs for the other reasons I outlined. This patch (or my local version), as it is, doesn't help anything.

Did you check with the original developer about re-writing some of their ARM assembly?

Quuxplusone commented 8 years ago

openssl isn't the only user of adrl...
But probably the most relevant one.

The only other use of adrl in an AOSP 6.0.0_r26 checkout is in Tremolo (vorbis
decoder), attaching the relevant file for reference.

Quuxplusone commented 8 years ago

Attached mdctARM.s (33089 bytes, application/octet-stream): Tremolo file that makes use of adrl

Quuxplusone commented 8 years ago

Hi Caroline,

Sorry I moved this to me, I was told to look at it and thought no one was
working on it. Just to wrap up my comments...

I did some investigation and the changes in the assembler would be much bigger
than the benefits would bring us, especially since there are only two uses we
know of and one of them is easy to replace. Usually, when the instruction is
not in the ISA documents, we tend to need a stronger reason to implement it.

For instance, in ADRL, GNU tools only implement it for ARM, while ARMCC does
also for Thumb2, but not 1. On ARM, the complications are in the MCStreamer (as
I described earlier), but in Thumb2 you'll additionally have to worry about IT
blocks (I'm guessing this is why GAS didn't do for Thumb2).

Our rule of thumb for implementing things not documented in the ARM ARM is:
 * If there are no alternatives in UAL / ARM ARM.
 * If the mapping is simple and clear, documented elsewhere
 * Implementation won't break or make code harder to understand/maintain
 * If the usage in the wild is extremely extensive and well known

In the ADRL case, there is an alternative (used for Thumb2 in your attachment),
the mapping is not clear ("whatever range two adds can cope with"), it will
make code harder to understand, and is not extensively used in the wild.

So, there's a pretty strong argument against implementing it. :)

cheers,
--renato

Quuxplusone commented 8 years ago

Bero,

In the same asm file you uploaded there is a sequence of ADR+ADD on the same
symbol that could be used on all examples, and all of them reading the symbol
ahead of the instruction, so there's no doubt it'd be an ADD, not a SUB.

cheers,
--renato

Quuxplusone commented 8 years ago

We have patched the version of openssl that we use, to remove the 'adrl' instruction, and I have reached out to the original openssl developer to ask him if he could remove it from openssl (no response so far).

You have convinced me that this bug is not worth trying to fix. Should I close it as won't fix, or just leave it open and remove myself from the 'assigned' field?

Quuxplusone commented 8 years ago

(In reply to comment #18)
> We have patched the version of openssl that we use, to remove the 'adrl'
> instruction, and I have reached out to the original openssl developer to ask
> him if he could remove it from openssl (no response so far).

Thank you! Having a working patch that is as good as the original code is 99%
of the work done to convince people to update their sources. :)

> You have convinced me that this bug is not worth trying to fix.  Should I
> close it as won't fix, or just leave it open and remove myself from the
> 'assigned' field?

It's up to you, really.

There is a similar issue in Android (which is why I was looking at it, too), so
feel free to assign the bug to me, and I'll close once we patch the Tremolo
sources, or keep it for yourself until both accept the work upstream.

Whatever works. :)

Quuxplusone commented 8 years ago

Ok, it's all yours now! ;-)

Quuxplusone commented 8 years ago

Bero,

Can you confirm the changes on Tremolo's side? Should be as simple as in
openssl.

cheers,
--renato

Quuxplusone commented 8 years ago

For the sake of completeness, the Tremolo case exhibits problems when the ADR
range goes beyond 4096, so ADRL was used. There are two ways to overcome that
without ADRL:

  1. Temp label

Change:

  ADRL r7, .Label
  ...
  ADRL r7, .Label
  ...
.Label
  .word 0xf00

To:

  ADR r7, .Temp1
  ADD r7, r7, #(.Label-.Temp1)
.Temp1
  ...
  ADR r7, .Temp2
  ADD r7, r7, #(.Label-.Temp2)
.Temp2
  ...
.Label
  .word 0xf00

This would only break if .Label move up past the ADR instructions, in which
case you'd have to use a SUB instead, or if .Label-.TempX becomes too large for
the ADD. Since ADD's range is a lot bigger than ADR, you could cope with a lot
of new cases.

  2. Repeat pattern

If the above doesn't help, repeating the maths will:

  ADR r7, .Label1
  ...
.Label1
  .word (.Global-.Label1)
  ...
  ADR r7, .Label2
  ...
.Label2
  .word (.Global-.Label2)
  ...
  etc

as many times as necessary. This is uglier, but works every time as long as you
keep them close enough and often enough.

Quuxplusone commented 8 years ago

Alternatives proposed, worked in the wild, so closing this bug. Please, refer to these solutions for future cases and let me know if they don't work on your particular case.

Quuxplusone commented 8 years ago

I've done some more investigation to see what would be needed to make ADRL work
in the assembler. In summary I agree with Renato, making ADRL work to the same
specification as the GNU assembler and armasm would not be easy, and would
likely be rejected upstream as not worth the cost. In general it is quite
difficult to get difficult to implement pseudo instructions accepted unless
they are officially required by the architecture or ABI.

I've put some thoughts on implementation at the end [*] just in case anyone
wants to build on them. I'm new to LLVM so it is possible that I've missed
something.

It isn't possible to replicate the full generality of ADRL in a macro but it is
possible to come close. In effect we fix the rotations that are used for each
of the instructions, whereas the assembler can use different rotations that
could potentially reach further addresses if those addresses were suitably
aligned. We also need to choose ahead of time whether to use add (label is
defined later) or sub (label has already been defined).

I won't claim that these are the optimal or clearest expressions of the macro,
but it should be possible to build on them. For example if you know your labels
are 4-byte aligned it should be possible to extend the range by altering the
masks.

        .macro MYADRL reg:req, label:req
        add \reg, pc, #((\label - .) & 0xff00)
        add \reg, \reg, #((\label - .) - ((\label - .) & 0xff00)) - 4
        .endm

        .macro MYADRLSUB reg:req, label:req
        sub \reg, pc, #((. - \label) & 0xff00)
        sub \reg, \reg, #((. - \label) - ((. - \label) & 0xff00)) + 4
        .endm

[*]
It has already been observed that the ARM assembler backend does not support
expansion of pseudo instructions into multiple instructions. This can be fixed
using a similar solution to the Mips backend. In ARM there is the additional
complication of IT blocks. If ADRL is implemented like it is defined in armasm
which supports ADRL in Thumb2 then we have to account for ADRL being the last
member of an IT block. It is true that we could restrict ADRL to ARM state like
in the GNU assembler so that this never happens, but we would have to be clear
about the restrictions should anyone attempt to make further modifications.

A more difficult problem is that while the first instruction is in effect an
ADR which can be transformed by a fixup into either an ADD or a SUB depending
on the final value of the label post layout, the second instruction is either
an ADD or SUB which cannot currently be transformed by a fixup post layout. The
reason is that for ADD and SUB the opcode bits are defined by tablegen and a
fixup can only OR extra bits and not clear existing ones. I think it could be
possible to either:
- reimplement ARM modified immediate data-processing instructions to take the
opcode from a fixup. I don't think that this would be a trivial piece of work
and could have implications for disassembly.
- Add a pseudo-like ADRL_pt2 instruction similar to ADR that can be transformed
into either an ADD or SUB via a fixup, but with customer parsing and decoding
such that it couldn't be written in the assembler or disassembled.

Neither of the options fit well within the existing framework.

I did consider (ab)using relaxation to change an ADDri into a SUBri and vice-
versa, depending on the value of the immediate at layout time. Unfortunately
because ADDri and SUBri are inverses the layout algorithm won't terminate. For
implementing ADRL it could be possible to fix this by just doing an ADDri to
SUBri and always emitting an ADDri, however this would also affect vanilla
ADDri instructions in an asymmetric way (When expr isn't known till layout
time, you could change an "ADD r0, r0, expr" to "SUB r0, r0, -expr" if expr
later turned out to be negative, but you couldn't change "SUB r0, r0, expr"
back to "ADD r0, r0, -expr").

It could also be possible to heuristically guess early whether an ADD/SUB is
needed based on the expression. However we can't guarantee we would guess
correctly in all cases.