Closed grahamwoodward closed 6 months ago
Likely another faulty definition in the LLVM tablegen files. I am currently updating to LLVM 18 (https://github.com/capstone-engine/capstone/pull/2298) and there are plenty of changes to AArch64 and SME/SVE. So it might be fixed already there.
@Rot127 Is that a commit I can pull down to my clone?
Not yet. Some disassembly tests still fail. But if you provide me with some test cases I would be thankful. This way I can check them before for you in the LLVM code and report back if it is indeed a LLVM bug (will take a while until it is fixed I assume) or something in CS.
Sure....my test case is simply
#define LEN 1024
unsigned long long a[LEN], b[LEN], c[LEN];
int main()
{
while(1) {
int i = 0;
for (; i < LEN; i++)
c[i] = a[i] + b[i];
}
return 0;
}
and then compile like
aarch64-linux-gnu-gcc vectorise2.c -o vectorise2 -O2 -ftree-vectorize -march=armv8.6-a
and
aarch64-linux-gnu-gcc vectorise2.c -o vectorise2 -O2 -ftree-vectorize -march=armv8.5-a+sve
An objdump shows instructions such as
628: 4ee18400 add v0.2d, v0.2d, v1.2d
and
638: 04e10000 add z0.d, z0.d, z1.d
respectively for the two compilations.
My perf tool is configuring Capstone like so
122 /* Initialise Capstone, which we use to disassemble the instruction */
123 if (cs_open(CS_ARCH_AARCH64, CS_MODE_ARM, &handle) != CS_ERR_OK) {
124 fprintf(stderr, "Failed to initialise Capstone! Aborting.\n");
125 exit(1);
126 }
127 cs_option(handle, CS_OPT_DETAIL, CS_OPT_ON);
128 cs_option(handle, CS_OPT_SKIPDATA, CS_OPT_ON);
My tool is using perf events to sample retired instructions and I use Capstone to disassemble the instructions and tell me about the groups/features.
In the first case for instruction
4ee18400 add v0.2d, v0.2d, v1.2d
it appears to have a group count of 1 and that group is HasNEON...my first question is why isn't it also in the HasNEONorSME group?
In the second case for instruction
04e10000 add z0.d, z0.d, z1.d
It appears to have a group count of 1 but not HasSME...it is instead HasSMEorSVE...so why this way and why only 1 group again?
Thanks, hope that helps
Not yet. Some disassembly tests still fail. But if you provide me with some test cases I would be thankful. This way I can check them before for you in the LLVM code and report back if it is indeed a LLVM bug (will take a while until it is fixed I assume) or something in CS.
what version is being used at present? I have llvm-14 installed and also a build of llvm-18 and the -14 version struggles with this
echo 0x00, 0x0c, 0xe2, 0x25, | llvm-mc -triple=aarch64 -disassemble -mattr=+all
'+all' is not a recognized feature for this target (ignoring feature)
.text
<stdin>:1:1: warning: invalid instruction encoding
0x00, 0x0c, 0xe2, 0x25,
(note even without -mattr=+all it fails with the encoding) but my local build of 18 is OK
echo 0x00, 0x0c, 0xe2, 0x25, | ./llvm-mc -triple=aarch64 -disassemble -mattr=+all
.text
whilelo p0.d, w0, w2
The v5
branch is based on LLVM-7. Current next
branch uses LLVM-16 for ARM, PPC, AArch64 (Alpha: LLVM-3) and the others still LLVM-7. I update ARM, PPC, AArch64 over the next weeks to LLVM-18.
it appears to have a group count of 1 and that group is HasNEON...my first question is why isn't it also in the HasNEONorSME group?
The add
instructions are defined here if I am not mistaken. And as you can see, it has only the HasNEON
group assigned. Not the HasNEONorSVE
. Because we generate the Capstone code from these definitions in the td
files we also inherit all the flaws which come with hand-written things.
The underlying problem is, that the current AArch64 definitions need to assign groups like HasNEONorSME
manually (just as it has been done in the file above). It is not derived dynamically from the fact, that it already has HasNEON
assigned. It is common to find these kind of issues in hand-written td
files, unfortunately.
I'll fix it with https://github.com/capstone-engine/capstone/pull/2298 in our LLVM fork by checking during generation.
This could be true for other aarch64 instructions?
Yes, there is a good chance that every instruction which should be of one of the following groups, isn't assigned to them:
HasSVEorSME
, HasSVE2orSME
, HasSVE2orSME2
, HasSVE2p1_or_HasSME
, HasSVE2p1_or_HasSME2
, HasSVE2p1_or_HasSME2p1
, HasNEONorSME
But we can fix this in our LLVM fork pretty easily. If we detect during code generation that a instruction is part of HasSVE
, we can just emit HasSVEorSME
as well.
But we can fix this in our LLVM fork pretty easily. If we detect during code generation that a instruction is part of
HasSVE
, we can just emitHasSVEorSME
as well.
do you have an ETA for v6?
https://github.com/capstone-engine/capstone/pull/2298 though should be done in two or three weeks. If you need to use it before, check the issues listed in the PR and if they affect you.
An ETA for v6
not really. If you want to use Capstone for AArch64 definitely use the next
branch.
The todo list for v6
release is still quite long, I doubt it can be released within 6 months.
2298 though should be done in two or three weeks. If you need to use it before, check the issues listed in the PR and if they affect you.
An ETA for
v6
not really. If you want to use Capstone for AArch64 definitely use thenext
branch. The todo list forv6
release is still quite long, I doubt it can be released within 6 months.
"If you want to use Capstone for AArch64 definitely use the next branch." pretty sure I'm using the next branch
Yes, I just wanted to point it out again, the difference between current next
and the v5
release is huge.
Also, the work towards v6
is mostly sponsored by RizinOrg, so I won't "just stop".
Yes, I just wanted to point it out again, the difference between current
next
and thev5
release is huge. Also, the work towardsv6
is mostly sponsored by RizinOrg, so I won't "just stop".
right so even though I'm using next, I don't have the above commit? Can I pull/check that commit out or in my tool link to it as the git submodule?
https://github.com/capstone-engine/capstone/pull/2298 is the PR which updates the AArch64 module to the state of LLVM-18. When it is done it will be merged into next
.
But it isn't done yet. Some instructions don't get disassembled correctly and need to be fixed. So it won't be useful to you in the current state. I can let you know, when it is in a usable state. Than you can use the branch of the PR.
@Rot127 excellent thanks very much
The
v5
branch is based on LLVM-7. Currentnext
branch uses LLVM-16 for ARM, PPC, AArch64 (Alpha: LLVM-3) and the others still LLVM-7. I update ARM, PPC, AArch64 over the next weeks to LLVM-18.it appears to have a group count of 1 and that group is HasNEON...my first question is why isn't it also in the HasNEONorSME group?
The
add
instructions are defined here if I am not mistaken. And as you can see, it has only theHasNEON
group assigned. Not theHasNEONorSVE
. Because we generate the Capstone code from these definitions in thetd
files we also inherit all the flaws which come with hand-written things.
How did you work out that that's the add?
The underlying problem is, that the current AArch64 definitions need to assign groups like
HasNEONorSME
manually (just as it has been done in the file above). It is not derived dynamically from the fact, that it already hasHasNEON
assigned. It is common to find these kind of issues in hand-writtentd
files, unfortunately.I'll fix it with https://github.com/capstone-engine/capstone/pull/2298 in our LLVM fork by checking during generation.
Can you explain more about the generation and how it works? Is that a capstone thing?
How did you work out that that's the add?
In AArch64InstrFormats.td
the basic instructions formats are defined. In this case all vector instructions with a common encoding which have three vector registers (BaseSIMDThreeSameVector
).
With a debugger I checked in Capstone what the LLVM internal ID is which it gave (one of the) the instructions from above: add v0.2d, v0.2d, v1.2d
= ADDv2i64
.
Guessing that it inherits it, I checked where this class is inherited and contains a portion of the LLVM name:
> grep -rnI "BaseSIMDThreeSameVector" llvm/lib/Target/AArch64/ | grep v2i64
llvm/lib/Target/AArch64/AArch64InstrFormats.td:5878: def v2i64 : BaseSIMDThreeSameVector<1, U, 0b111, opc, V128,
Checking AArch64InstrFormats.td:5878
it shows that it defines a multiclass (SIMDThreeSameVector
) there, which defines multiple instructions at once (one of them the v2i64
variant). Grepping for this aggin shows us that indeed an add
instruction of this type wass defined.
> grep -rnI " SIMDThreeSameVector" llvm/lib/Target/AArch64/ | grep ADD
llvm/lib/Target/AArch64/AArch64InstrInfo.td:1452:defm FCADD : SIMDThreeSameVectorComplexHSD<1, 0b111, complexrotateopodd,
llvm/lib/Target/AArch64/AArch64InstrInfo.td:5170:defm ADD : SIMDThreeSameVector<0, 0b10000, "add", add>;
...
In short, you have to manually the inheritance paths and have a feeling how these definitions are commonly written. SO you can make good guesses where to start looking.
Can you explain more about the generation and how it works? Is that a capstone thing?
This is documented here and in the README.md of our LLVM fork.
@Rot127 this might be a daft question but why isn't LLVM included as a git submodule that's then cloned/checked out when you initially clone Capstone? Wouldn't that mean Capstone always uses the latest LLVM?
The TableGen backends are heavily modified to emit C code. LLVM backends only emit C++ code. So having our fork and rebasing in on top of LLVM is easier and gives the possibility of gettings PRs etc.
The TableGen backends are heavily modified to emit C code. LLVM backends only emit C++ code. So having our fork and rebasing in on top of LLVM is easier and gives the possibility of gettings PRs etc.
would it not be better if LLVM themselves maintained the TG to omit C code? presumably they don't want to?
We proposed this idea, but there was no real feedback or interest. At least none which translated to actions. Also because it would require to redesign and rewrite a lot of the backends (see the discussion here: https://reviews.llvm.org/D138323). And because for LLVM internal stuff they work just fine, the priority is understandably very low. I will push the topic again when I find time to build an argument. Because the problem is more general than just "emit C code".
@grahamwoodward So it turned out I was partially wrong. The groups are not missing due to faulty definitions, but because they mean something else.
According to AArch64InstrInfo.td
(where the groups are defined), these HasXorY
features are there because:
A subset of SVE(2) instructions are legal in Streaming SVE execution mode, they should be enabled if either has been specified.
So an instruction gets this group only if it can be legally executed with SVE and SME.
Now, to answer your two questions:
it appears to have a group count of 1 and that group is HasNEON...my first question is why isn't it also in the HasNEONorSME group?
The naming here is a little unlucky for our use case. In LLVM the Has...
naming makes sense, because an assembler checks if a feature is present (target "has" a feature).
So it does Feature_A ==assemble==> instruction_X
. But what we did here was Feature_A ==implies==> Feature_A_or_B
.
Which is not the original meaning of the flags.
It appears to have a group count of 1 but not HasSME...it is instead HasSMEorSVE...so why this way and why only 1 group again?
Although this is logically valid, I'd prefer to not add this. An instruction can be defined for SME
or SVE
and work with the other feature enabled.
But adding both HasSME
and HasSVE
as flags does:
In general I think it is trivial for the user to just add a few switch/match cases to check those features. They are not that many in the end.
I hope this resolves the issue. Going to close this now. But will open another issue about documenting the groups we have. And how they should be interpreted.
@grahamwoodward So it turned out I was partially wrong. The groups are not missing due to faulty definitions, but because they mean something else.
According to
AArch64InstrInfo.td
(where the groups are defined), theseHasXorY
features are there because:A subset of SVE(2) instructions are legal in Streaming SVE execution mode, they should be enabled if either has been specified.
So an instruction gets this group only if it can be legally executed with SVE and SME.
Now, to answer your two questions:
it appears to have a group count of 1 and that group is HasNEON...my first question is why isn't it also in the HasNEONorSME group?
The naming here is a little unlucky for our use case. In LLVM the
Has...
naming makes sense, because an assembler checks if a feature is present (target "has" a feature).So it does
Feature_A ==assemble==> instruction_X
. But what we did here wasFeature_A ==implies==> Feature_A_or_B
. Which is not the original meaning of the flags.It appears to have a group count of 1 but not HasSME...it is instead HasSMEorSVE...so why this way and why only 1 group again?
Although this is logically valid, I'd prefer to not add this. An instruction can be defined for
SME
orSVE
and work with the other feature enabled. But adding bothHasSME
andHasSVE
as flags does:
- Duplicate information
- Implies that the instruction is defined in both ISA extensions (which doesn't need to be the case). And LLVM doesn't tell, in which extension they are defined. Doing this just adds unnecessary manual work (which we try to avoid).
In general I think it is trivial for the user to just add a few switch/match cases to check those features. They are not that many in the end.
I hope this resolves the issue. Going to close this now. But will open another issue about documenting the groups we have. And how they should be interpreted.
Right so basically I misunderstood what those "HasXorY" groups meant?
Just because an instruction is NEON or is SVE or is SME...it doesn't mean it'll always be in HasNEON
and any HasNEONor<insert other group>
What did you mean by
In general I think it is trivial for the user to just add a few switch/match cases to check those features. They are not that many in the end. ?
Just because an instruction is NEON or is SVE or is SME...it doesn't mean it'll always be in HasNEON and any HasNEONor
Yes, an instruction which is HasNEON
is not necessarily in HasNEONorSME
.
What did you mean by
In your use case you described, you can also just check for HasNEONorSME
to get the information, whether an instruction is executable with an enabled NEON extension.
Running an AWS Ubuntu aarch64 VM
Linux ip-10-252-39-126 6.5.0-1018-aws #18~22.04.1-Ubuntu SMP Fri Apr 5 17:56:39 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
git clone
I'm using Capstone in a performance tool I'm trying to write (relating to retired instructions). It's only for aarch64 and I have a simple test benchmark that I'm compiling like so
aarch64-linux-gnu-gcc vectorise2.c -o vectorise2 -O2 -ftree-vectorize -march=armv8.6-a
and
aarch64-linux-gnu-gcc vectorise2.c -o vectorise2 -O2 -ftree-vectorize -march=armv8.5-a+sve
When compiling with armv8.6-a and running the test benchmark I'm then running my tool and asking it to show HasNEON and HasNEONorSME and it appears to give me the details indicating that the retired/decoded instruction is HasNEON but not HasNEONorSME. If it's in the group/feature HasNEON, then why not HasNEONorSME? Both are those are true?
Similarly, when compiler with 8.5-a+sve, then I see a zero count for HasSVE however I see a >0 count for HasSVEorSME...again that doesn't make sense...if the retired instruction is in the group/feature HasSVEorSME, then surely I should see a count for HasSME or a count for HasSVE?