[AArch64] Instr. with groups `HasNEON`, don't have `HasNEONorSME` and similar assigned.

grahamwoodward commented 6 months ago

Running an AWS Ubuntu aarch64 VM Linux ip-10-252-39-126 6.5.0-1018-aws #18~22.04.1-Ubuntu SMP Fri Apr 5 17:56:39 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

Questions	Answers
OS/arch/bits	Ubuntu AArch64.
Architecture	armv8
Source of Capstone	`git clone`
Version/git commit	24d99a907b2b7676a6e522b262ba5759481e6ae5

I'm using Capstone in a performance tool I'm trying to write (relating to retired instructions). It's only for aarch64 and I have a simple test benchmark that I'm compiling like so

aarch64-linux-gnu-gcc vectorise2.c -o vectorise2 -O2 -ftree-vectorize -march=armv8.6-a

and

aarch64-linux-gnu-gcc vectorise2.c -o vectorise2 -O2 -ftree-vectorize -march=armv8.5-a+sve

When compiling with armv8.6-a and running the test benchmark I'm then running my tool and asking it to show HasNEON and HasNEONorSME and it appears to give me the details indicating that the retired/decoded instruction is HasNEON but not HasNEONorSME. If it's in the group/feature HasNEON, then why not HasNEONorSME? Both are those are true?

Similarly, when compiler with 8.5-a+sve, then I see a zero count for HasSVE however I see a >0 count for HasSVEorSME...again that doesn't make sense...if the retired instruction is in the group/feature HasSVEorSME, then surely I should see a count for HasSME or a count for HasSVE?

Rot127 commented 6 months ago

Likely another faulty definition in the LLVM tablegen files. I am currently updating to LLVM 18 (https://github.com/capstone-engine/capstone/pull/2298) and there are plenty of changes to AArch64 and SME/SVE. So it might be fixed already there.

grahamwoodward commented 6 months ago

@Rot127 Is that a commit I can pull down to my clone?

Rot127 commented 6 months ago

Not yet. Some disassembly tests still fail. But if you provide me with some test cases I would be thankful. This way I can check them before for you in the LLVM code and report back if it is indeed a LLVM bug (will take a while until it is fixed I assume) or something in CS.

grahamwoodward commented 6 months ago

Sure....my test case is simply

#define LEN 1024
unsigned long long a[LEN], b[LEN], c[LEN];

int main()
{
    while(1) {
        int i = 0;
        for (; i < LEN; i++)
            c[i] = a[i] + b[i];
    }
    return 0;
}

and then compile like aarch64-linux-gnu-gcc vectorise2.c -o vectorise2 -O2 -ftree-vectorize -march=armv8.6-a and aarch64-linux-gnu-gcc vectorise2.c -o vectorise2 -O2 -ftree-vectorize -march=armv8.5-a+sve

An objdump shows instructions such as 628: 4ee18400 add v0.2d, v0.2d, v1.2d and 638: 04e10000 add z0.d, z0.d, z1.d respectively for the two compilations.

My perf tool is configuring Capstone like so

  122   /* Initialise Capstone, which we use to disassemble the instruction */
  123   if (cs_open(CS_ARCH_AARCH64, CS_MODE_ARM, &handle) != CS_ERR_OK) {
  124       fprintf(stderr, "Failed to initialise Capstone! Aborting.\n");
  125       exit(1);
  126   }
  127   cs_option(handle, CS_OPT_DETAIL, CS_OPT_ON);
  128   cs_option(handle, CS_OPT_SKIPDATA, CS_OPT_ON);

My tool is using perf events to sample retired instructions and I use Capstone to disassemble the instructions and tell me about the groups/features.

In the first case for instruction 4ee18400 add v0.2d, v0.2d, v1.2d

it appears to have a group count of 1 and that group is HasNEON...my first question is why isn't it also in the HasNEONorSME group?

In the second case for instruction 04e10000 add z0.d, z0.d, z1.d

It appears to have a group count of 1 but not HasSME...it is instead HasSMEorSVE...so why this way and why only 1 group again?

Thanks, hope that helps

grahamwoodward commented 6 months ago

Not yet. Some disassembly tests still fail. But if you provide me with some test cases I would be thankful. This way I can check them before for you in the LLVM code and report back if it is indeed a LLVM bug (will take a while until it is fixed I assume) or something in CS.

what version is being used at present? I have llvm-14 installed and also a build of llvm-18 and the -14 version struggles with this

echo 0x00, 0x0c, 0xe2, 0x25, | llvm-mc -triple=aarch64 -disassemble -mattr=+all
'+all' is not a recognized feature for this target (ignoring feature)
        .text
<stdin>:1:1: warning: invalid instruction encoding
0x00, 0x0c, 0xe2, 0x25,

(note even without -mattr=+all it fails with the encoding) but my local build of 18 is OK

echo 0x00, 0x0c, 0xe2, 0x25, | ./llvm-mc -triple=aarch64 -disassemble -mattr=+all
        .text
        whilelo p0.d, w0, w2

Rot127 commented 6 months ago

The v5 branch is based on LLVM-7. Current next branch uses LLVM-16 for ARM, PPC, AArch64 (Alpha: LLVM-3) and the others still LLVM-7. I update ARM, PPC, AArch64 over the next weeks to LLVM-18.

it appears to have a group count of 1 and that group is HasNEON...my first question is why isn't it also in the HasNEONorSME group?

The add instructions are defined here if I am not mistaken. And as you can see, it has only the HasNEON group assigned. Not the HasNEONorSVE. Because we generate the Capstone code from these definitions in the td files we also inherit all the flaws which come with hand-written things.

The underlying problem is, that the current AArch64 definitions need to assign groups like HasNEONorSME manually (just as it has been done in the file above). It is not derived dynamically from the fact, that it already has HasNEON assigned. It is common to find these kind of issues in hand-written td files, unfortunately.

I'll fix it with https://github.com/capstone-engine/capstone/pull/2298 in our LLVM fork by checking during generation.

grahamwoodward commented 6 months ago

This could be true for other aarch64 instructions?

Rot127 commented 6 months ago

Yes, there is a good chance that every instruction which should be of one of the following groups, isn't assigned to them: HasSVEorSME, HasSVE2orSME, HasSVE2orSME2, HasSVE2p1_or_HasSME, HasSVE2p1_or_HasSME2, HasSVE2p1_or_HasSME2p1, HasNEONorSME

Rot127 commented 6 months ago

But we can fix this in our LLVM fork pretty easily. If we detect during code generation that a instruction is part of HasSVE, we can just emit HasSVEorSME as well.

grahamwoodward commented 6 months ago

But we can fix this in our LLVM fork pretty easily. If we detect during code generation that a instruction is part of HasSVE, we can just emit HasSVEorSME as well.

do you have an ETA for v6?

Rot127 commented 6 months ago

https://github.com/capstone-engine/capstone/pull/2298 though should be done in two or three weeks. If you need to use it before, check the issues listed in the PR and if they affect you.

An ETA for v6 not really. If you want to use Capstone for AArch64 definitely use the next branch. The todo list for v6 release is still quite long, I doubt it can be released within 6 months.

grahamwoodward commented 6 months ago

2298 though should be done in two or three weeks. If you need to use it before, check the issues listed in the PR and if they affect you.

An ETA for v6 not really. If you want to use Capstone for AArch64 definitely use the next branch. The todo list for v6 release is still quite long, I doubt it can be released within 6 months.

"If you want to use Capstone for AArch64 definitely use the next branch." pretty sure I'm using the next branch

Rot127 commented 6 months ago

Yes, I just wanted to point it out again, the difference between current next and the v5 release is huge. Also, the work towards v6 is mostly sponsored by RizinOrg, so I won't "just stop".

grahamwoodward commented 6 months ago

Yes, I just wanted to point it out again, the difference between current next and the v5 release is huge. Also, the work towards v6 is mostly sponsored by RizinOrg, so I won't "just stop".

right so even though I'm using next, I don't have the above commit? Can I pull/check that commit out or in my tool link to it as the git submodule?

Rot127 commented 6 months ago

https://github.com/capstone-engine/capstone/pull/2298 is the PR which updates the AArch64 module to the state of LLVM-18. When it is done it will be merged into next.

But it isn't done yet. Some instructions don't get disassembled correctly and need to be fixed. So it won't be useful to you in the current state. I can let you know, when it is in a usable state. Than you can use the branch of the PR.

grahamwoodward commented 6 months ago

@Rot127 excellent thanks very much

grahamwoodward commented 6 months ago

The v5 branch is based on LLVM-7. Current next branch uses LLVM-16 for ARM, PPC, AArch64 (Alpha: LLVM-3) and the others still LLVM-7. I update ARM, PPC, AArch64 over the next weeks to LLVM-18.

it appears to have a group count of 1 and that group is HasNEON...my first question is why isn't it also in the HasNEONorSME group?

The add instructions are defined here if I am not mistaken. And as you can see, it has only the HasNEON group assigned. Not the HasNEONorSVE. Because we generate the Capstone code from these definitions in the td files we also inherit all the flaws which come with hand-written things.

How did you work out that that's the add?

The underlying problem is, that the current AArch64 definitions need to assign groups like HasNEONorSME manually (just as it has been done in the file above). It is not derived dynamically from the fact, that it already has HasNEON assigned. It is common to find these kind of issues in hand-written td files, unfortunately.

I'll fix it with https://github.com/capstone-engine/capstone/pull/2298 in our LLVM fork by checking during generation.

Can you explain more about the generation and how it works? Is that a capstone thing?

Rot127 commented 6 months ago

How did you work out that that's the add?

In AArch64InstrFormats.td the basic instructions formats are defined. In this case all vector instructions with a common encoding which have three vector registers (BaseSIMDThreeSameVector).

With a debugger I checked in Capstone what the LLVM internal ID is which it gave (one of the) the instructions from above: add v0.2d, v0.2d, v1.2d = ADDv2i64.

Guessing that it inherits it, I checked where this class is inherited and contains a portion of the LLVM name:

> grep -rnI "BaseSIMDThreeSameVector" llvm/lib/Target/AArch64/ | grep v2i64
llvm/lib/Target/AArch64/AArch64InstrFormats.td:5878:  def v2i64 : BaseSIMDThreeSameVector<1, U, 0b111, opc, V128,

Checking AArch64InstrFormats.td:5878 it shows that it defines a multiclass (SIMDThreeSameVector) there, which defines multiple instructions at once (one of them the v2i64 variant). Grepping for this aggin shows us that indeed an add instruction of this type wass defined.

> grep -rnI " SIMDThreeSameVector" llvm/lib/Target/AArch64/ | grep ADD
llvm/lib/Target/AArch64/AArch64InstrInfo.td:1452:defm FCADD : SIMDThreeSameVectorComplexHSD<1, 0b111, complexrotateopodd,
llvm/lib/Target/AArch64/AArch64InstrInfo.td:5170:defm ADD     : SIMDThreeSameVector<0, 0b10000, "add", add>;
...

In short, you have to manually the inheritance paths and have a feeling how these definitions are commonly written. SO you can make good guesses where to start looking.

Can you explain more about the generation and how it works? Is that a capstone thing?

This is documented here and in the README.md of our LLVM fork.

grahamwoodward commented 6 months ago

@Rot127 this might be a daft question but why isn't LLVM included as a git submodule that's then cloned/checked out when you initially clone Capstone? Wouldn't that mean Capstone always uses the latest LLVM?

Rot127 commented 6 months ago

The TableGen backends are heavily modified to emit C code. LLVM backends only emit C++ code. So having our fork and rebasing in on top of LLVM is easier and gives the possibility of gettings PRs etc.

grahamwoodward commented 6 months ago

The TableGen backends are heavily modified to emit C code. LLVM backends only emit C++ code. So having our fork and rebasing in on top of LLVM is easier and gives the possibility of gettings PRs etc.

would it not be better if LLVM themselves maintained the TG to omit C code? presumably they don't want to?

Rot127 commented 6 months ago

We proposed this idea, but there was no real feedback or interest. At least none which translated to actions. Also because it would require to redesign and rewrite a lot of the backends (see the discussion here: https://reviews.llvm.org/D138323). And because for LLVM internal stuff they work just fine, the priority is understandably very low. I will push the topic again when I find time to build an argument. Because the problem is more general than just "emit C code".

Rot127 commented 6 months ago

@grahamwoodward So it turned out I was partially wrong. The groups are not missing due to faulty definitions, but because they mean something else.

According to AArch64InstrInfo.td (where the groups are defined), these HasXorY features are there because:

A subset of SVE(2) instructions are legal in Streaming SVE execution mode, they should be enabled if either has been specified.

So an instruction gets this group only if it can be legally executed with SVE and SME.

Now, to answer your two questions:

it appears to have a group count of 1 and that group is HasNEON...my first question is why isn't it also in the HasNEONorSME group?

The naming here is a little unlucky for our use case. In LLVM the Has... naming makes sense, because an assembler checks if a feature is present (target "has" a feature).

So it does Feature_A ==assemble==> instruction_X. But what we did here was Feature_A ==implies==> Feature_A_or_B. Which is not the original meaning of the flags.

It appears to have a group count of 1 but not HasSME...it is instead HasSMEorSVE...so why this way and why only 1 group again?

Although this is logically valid, I'd prefer to not add this. An instruction can be defined for SME or SVE and work with the other feature enabled. But adding both HasSME and HasSVE as flags does:

Duplicate information
Implies that the instruction is defined in both ISA extensions (which doesn't need to be the case). And LLVM doesn't tell, in which extension they are defined. Doing this just adds unnecessary manual work (which we try to avoid).

In general I think it is trivial for the user to just add a few switch/match cases to check those features. They are not that many in the end.

I hope this resolves the issue. Going to close this now. But will open another issue about documenting the groups we have. And how they should be interpreted.

grahamwoodward commented 6 months ago

@grahamwoodward So it turned out I was partially wrong. The groups are not missing due to faulty definitions, but because they mean something else.

According to AArch64InstrInfo.td (where the groups are defined), these HasXorY features are there because:

A subset of SVE(2) instructions are legal in Streaming SVE execution mode, they should be enabled if either has been specified.

So an instruction gets this group only if it can be legally executed with SVE and SME.

Now, to answer your two questions:

it appears to have a group count of 1 and that group is HasNEON...my first question is why isn't it also in the HasNEONorSME group?

The naming here is a little unlucky for our use case. In LLVM the Has... naming makes sense, because an assembler checks if a feature is present (target "has" a feature).

So it does Feature_A ==assemble==> instruction_X. But what we did here was Feature_A ==implies==> Feature_A_or_B. Which is not the original meaning of the flags.

It appears to have a group count of 1 but not HasSME...it is instead HasSMEorSVE...so why this way and why only 1 group again?

Although this is logically valid, I'd prefer to not add this. An instruction can be defined for SME or SVE and work with the other feature enabled. But adding both HasSME and HasSVE as flags does:

Duplicate information

Implies that the instruction is defined in both ISA extensions (which doesn't need to be the case). And LLVM doesn't tell, in which extension they are defined. Doing this just adds unnecessary manual work (which we try to avoid).

In general I think it is trivial for the user to just add a few switch/match cases to check those features. They are not that many in the end.

I hope this resolves the issue. Going to close this now. But will open another issue about documenting the groups we have. And how they should be interpreted.

Right so basically I misunderstood what those "HasXorY" groups meant?

Just because an instruction is NEON or is SVE or is SME...it doesn't mean it'll always be in HasNEON and any HasNEONor<insert other group>

What did you mean by

In general I think it is trivial for the user to just add a few switch/match cases to check those features. They are not that many in the end. ?

Rot127 commented 6 months ago

Just because an instruction is NEON or is SVE or is SME...it doesn't mean it'll always be in HasNEON and any HasNEONor

Yes, an instruction which is HasNEON is not necessarily in HasNEONorSME.

What did you mean by

In your use case you described, you can also just check for HasNEONorSME to get the information, whether an instruction is executable with an enabled NEON extension.

capstone-engine / capstone

[AArch64] Instr. with groups `HasNEON`, don't have `HasNEONorSME` and similar assigned. #2343

2298 though should be done in two or three weeks. If you need to use it before, check the issues listed in the PR and if they affect you.