DynamoRIO / dynamorio

Dynamic Instrumentation Tool Platform
Other
2.67k stars 566 forks source link

AArch64 "Undefined HINT instruction found" running simple Java app #5331

Open derekbruening opened 2 years ago

derekbruening commented 2 years ago

I ran the ReadWrite.java from https://github.com/DynamoRIO/dynamorio/issues/5309#issue-1119299130 on our Jenkins machine and it has several of these warnings:

<Undefined HINT instruction found: encoding 0xd503245f (CRm:op2 0x22)
>
<Undefined HINT instruction found: encoding 0xd503245f (CRm:op2 0x22)
>
<Undefined HINT instruction found: encoding 0xd503245f (CRm:op2 0x22)
>
<Undefined HINT instruction found: encoding 0xd503245f (CRm:op2 0x22)
>

I didn't look up what this hint does: does DR need to take any action there or this is just a missing innocuous opcode?

$ disasm_a64 d503245f
llvm-mc:   d503245f hint #34
capstone:  d503245f hint #0x22
bfd:       d503245f bti c
DynamoRIO: d503245f xx $0xd503245f %sp %x2 %x9 %x3 -> %sp %x2 %x9 %x3
AssadHashmi commented 2 years ago

I didn't look up what this hint does: does DR need to take any action there or this is just a missing innocuous opcode?

Looks like this HINT became a mandatory feature, FEAT_BTI, introduced in v8.5, https://developer.arm.com/documentation/ddi0596/2021-09/Base-Instructions/BTI--Branch-Target-Identification-

The 3 possible <target> types are encoded in the CRm:op2 field:

 Crm op2
0100:xx0
     00   undefined
     01   BTI C      target function call instrs BLR and BR using X16 and X17 for guarded pages
     10   BTI J      targets jumps using BR only
     11   BTI JC     both of above

Which implies: Undefined HINT instruction found: encoding 0xd503245f (CRm:op2 0x22) means BTI C

The AArch64 codec supports a small set of HINT ops, see https://github.com/DynamoRIO/dynamorio/blob/c99bcafc7e9057e0adb83fc42784a2fb1220e27e/core/ir/aarch64/codec.txt#L68 I'll raise a PR for BTI.

According to the spec CPUs with FEAT_BTI enabled will trap any instruction which tries to do an indirect jump to anything other than a BTI.

I don't know the details of DR's code cache and block/fragment linking, but could this be affecting the performance hits you're seeing? i.e. link optimisations just fail and fallback to context switches into DR's control loop.

derekbruening commented 2 years ago

According to the spec CPUs with FEAT_BTI enabled will trap any instruction which tries to do an indirect jump to anything other than a BTI.

I don't know the details of DR's code cache and block/fragment linking, but could this be affecting the performance hits you're seeing? i.e. link optimisations just fail and fallback to context switches into DR's control loop.

Would this trap show up as a signal to user mode that we would see?

AssadHashmi commented 2 years ago

Would this trap show up as a signal to user mode that we would see?

Yes. According to the AArch64 spec, h/w will raise a Branch Target exception when a guarded memory region is accessed. This comment in Linux source says that'll be a SIGILL in user-space: https://github.com/torvalds/linux/blob/555f3d7be91a873114c9656069f1a9fa476ec41a/arch/arm64/kernel/signal.c#L746

         * Signal delivery to a location in a PROT_BTI guarded page
         * that is not a function entry point will now trigger a
         * SIGILL in userspace.
         *
         * If the signal handler entry point is not in a PROT_BTI
         * guarded page, this is harmless.
         */

Addition of BTI support in Linux: https://patchwork.kernel.org/project/linux-arm-kernel/patch/1571419545-20401-6-git-send-email-Dave.Martin@arm.com/

Is it possible for the user to run on h/w which doesn't support FEAT_BTI or a build of the Java app which doesn't generate HINT 0x22? Running without BTI would be a quick way of checking if that is a/the cause.

The C++ binary doesn't display the same performance hit so could it be the JVM using BTI causing the problem?

derekbruening commented 2 years ago

Some clarifications/questions:

AssadHashmi commented 2 years ago

Some clarifications/questions:

  • These HINT warnings are what I observed running java on the Jenkins machine. I don't think that has ARMv8.5?

It doesn't support FEAT_BTI. So the HINT appears in the instruction stream and is ignored, i.e. treated as NOP.

  • If this is being hit and it is raising a SIGILL over and over, your theory is that the JVM is handling the SIGILL and continuing, rather than aborting, and the signal raise and handling is disrupting every hot path/loop?

Yes, that was my theory but as the machine it's running on doesn't support FEAT_BTI it's irrelevant now.

kuhanov commented 2 years ago
  • @kuhanov -- do you also see these HINT warnings on your machine?

no, I didn't see such warnings on my machine

derekbruening commented 1 year ago

These are also seen on Mac M1 machines running hello,world:

<Undefined HINT instruction found: encoding 0xd503245f (CRm:op2 0x22)
>
<Undefined HINT instruction found: encoding 0xd503245f (CRm:op2 0x22)
>