buildsi / build-abi-tests

0 stars 0 forks source link

DWARF representation of vector extension types #8

Open hainest opened 3 years ago

hainest commented 3 years ago

test.c

#include <immintrin.h>

void foo(__m256 a, float b[8]){}

void bar() {
  __m256 a;
  float b[8];
}

Compiled with gcc-11 -std=c11 -mavx -O3 -g -gdwarf-5 -fPIC -shared -o libtest.so test.c. Dumping with readelf gives1

// float
<1><66>: Abbrev Number: 1 (DW_TAG_base_type)
   <67>   DW_AT_byte_size   : 4
   <68>   DW_AT_encoding    : 4  (float)
   <69>   DW_AT_name        : (indirect string, offset: 0x125): float

// __m256 -> typedef float __m256 __attribute__ ((__vector_size__ (32), __may_alias__));
<1><90>: Abbrev Number: 6 (DW_TAG_typedef)
   <91>   DW_AT_name        : (indirect string, offset: 0xea): __m256
   <98>   DW_AT_type        : <0x9c>
<1><9c>: Abbrev Number: 7 (DW_TAG_array_type)
   <9d>   DW_AT_GNU_vector  : 1
   <9d>   DW_AT_type        : <0x66>

// float[8]
<1><d9>: Abbrev Number: 10 (DW_TAG_array_type)
   <da>   DW_AT_type        : <0x66>
   <de>   DW_AT_sibling     : <0xe9>
<2><e2>: Abbrev Number: 11 (DW_TAG_subrange_type)
   <e3>   DW_AT_type        : <0x35>
   <e7>   DW_AT_upper_bound : 7

// void foo(__m256 a, float b[8]){}
<1><e9>: Abbrev Number: 12 (DW_TAG_subprogram)
   <ea>   DW_AT_name        : foo
<2><107>: Abbrev Number: 3 (DW_TAG_formal_parameter)
   <108>   DW_AT_name        : a
   <10b>   DW_AT_type        : <0x90>
   <10f>   DW_AT_location    : 1 byte block: 61  (DW_OP_reg17 (xmm0))
<2><111>: Abbrev Number: 3 (DW_TAG_formal_parameter)
   <112>   DW_AT_name        : b
   <115>   DW_AT_type        : <0x11c>
   <119>   DW_AT_location    : 1 byte block: 55  (DW_OP_reg5 (rdi))
<1><11c>: Abbrev Number: 13 (DW_TAG_pointer_type)
   <11d>   DW_AT_byte_size   : 8
   <11e>   DW_AT_type        : <0x66>

// void bar() {
//   __m256 a;
//   float b[8];
// }
<1><a8>: Abbrev Number: 9 (DW_TAG_subprogram)
   <a9>   DW_AT_name        : bar
<2><c6>: Abbrev Number: 2 (DW_TAG_variable)
   <c7>   DW_AT_name        : a
   <cb>   DW_AT_type        : <0x90>
<2><cf>: Abbrev Number: 2 (DW_TAG_variable)
   <d0>   DW_AT_name        : b
   <d4>   DW_AT_type        : <0xd9>

The key problem is that DWARF treats vector types as arrays. The only indication you get that you are dealing with a vector type is 1) the name string is __m256 and 2) the non-standard DW_AT_GNU_vector. This is problematic for high-level parsers like Dyninst where that information isn't available until after the determination of array-ness has been made. Also, vectors aren't logically arrays, either; the AMD64 ABI treats them differently (not sure about the others). From a language perspective, they aren't the same because vectors don't possess the awful array-to-pointer decay. You can index them when using gcc, but I'm not sure if that's portable.

I see no reason why we can't just have a DW_TAG_vector_type that has a DW_AT_name, DW_AT_type, and a DW_AT_byte_size.


[1] readelf.log

woodard commented 3 years ago

On Aug 27, 2021, at 7:43 PM, Tim Haines @.***> wrote: The key problem is that DWARF treats vector types as arrays. The only indication you get that you are dealing with a vector type is 1) the name string is __m256 and 2) the non-standard DW_AT_GNU_vector.

I wouldn’t rely on the name string. There are 201007.1 http://dwarfstd.org/ShowIssue.php?issue=201007.1 and http://dwarfstd.org/ShowIssue.php?issue=200720.1 neither of which deal with this problem.

One of the challenges that they are working to solve is say you drop a breakpoint in the middle of an loop that uses vector instructions. You need to be able to print a particular element for that may be part of the vectorized loop. I’m not going to take the time to write this in DWARF expression syntax but the concept is that the location list can have DWARF expressions for each element being acted on.

So assume C is the loop counter and the calculation is happening in a SIMD register. 201007.1 allows you to write DWARF expressions which pick apart the SIMD register and extract the individual elements. That way in the debugger you can print vec[c+3] and get the piece of data.

The two pieces being updated still do not introduce a vector DWARF base type. The one which tries to introduce the SIMD width idea is also flawed. I was discussing that with our DWARF commitee members this morning. I showed them how we need some sort of extension to handle things that do not have fixed vector lengths like AMD SVE and RISC-V’s Vector extension. I had to go around 3 or 4 times but I think that I proved my point that we need something else or else you don’t know how big the vector size the hardware supports and so you don’t know how many loop interations you are executing at once and therefore how much data you pull out of the vector regs vs. what is still in memory. So my argument is that the 200720.1 will need to be revised before being accepted.

This is problematic for high-level parsers like Dyninst where that information isn't available until after the determination of array-ness has been made.

I would argue that this is a bit of a bug in dyninst. You really need to handle the vector bit of it before you handle the arrayness of it.

As for it being non-standard, I would agree that needs to change. I’m surprised that is not currently on the docket. It needs to be. I will mention that to our DWARF committee members and see if we can add that. The fact that DW_AT_GNU_Vector has been in use since 2002 gives it a pretty good track record. Therefore, convincing people that it needs to be assimilated in the standard shouldn’t be that hard.

Also, vectors aren't logically arrays, either; the AMD64 ABI treats them differently (not sure about the others). From a language perspective, they aren't the same because vectors don't possess the awful array-to-pointer decay. You can index them when using gcc, but I'm not sure if that's portable.

From all I know about vectors indexing works with some level of difficulty on pretty much everything. I see no reason why we can't just have a DW_TAG_vector_type that has a DW_AT_name, DW_AT_type, and a DW_AT_byte_size.

Write it up and sort of in the format of one of the other DWARF issues. I’ll see if I can sell that to any of the DWARF committee members to sponsor it. I don’t really disagree with you. However, I am not really convinced either.

I don’t think that the decay to pointer behavior vs. value passing in the psABI is going to be convincing to the DWARF committee. Or maybe I just haven’t heard a good enough argument yet. The other alternative is we know that Markus Metzger is going to have to revise 201007.1 to handle SVE. His proposal is very tight right now. Maybe we can get him to expand it and adopt the vector as a base type idea. Last time I checked he was still at intel.

-ben

[1] readelf.log https://github.com/buildsi/build-abi-tests/files/7070017/readelf.log — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/buildsi/build-abi-tests/issues/8, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD4BGLALOHME5542OSBQJ3T7BELNANCNFSM5C6T5AUQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

woodard commented 3 years ago

Heh, I think we may have sparked the discovery of this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102027

-ben

On Aug 27, 2021, at 9:50 PM, Ben Woodard @.***> wrote: 

On Aug 27, 2021, at 7:43 PM, Tim Haines @.***> wrote: The key problem is that DWARF treats vector types as arrays. The only indication you get that you are dealing with a vector type is 1) the name string is __m256 and 2) the non-standard DW_AT_GNU_vector.

I wouldn’t rely on the name string. There are 201007.1 http://dwarfstd.org/ShowIssue.php?issue=201007.1 and http://dwarfstd.org/ShowIssue.php?issue=200720.1 neither of which deal with this problem.

One of the challenges that they are working to solve is say you drop a breakpoint in the middle of an loop that uses vector instructions. You need to be able to print a particular element for that may be part of the vectorized loop. I’m not going to take the time to write this in DWARF expression syntax but the concept is that the location list can have DWARF expressions for each element being acted on.

So assume C is the loop counter and the calculation is happening in a SIMD register. 201007.1 allows you to write DWARF expressions which pick apart the SIMD register and extract the individual elements. That way in the debugger you can print vec[c+3] and get the piece of data.

The two pieces being updated still do not introduce a vector DWARF base type. The one which tries to introduce the SIMD width idea is also flawed. I was discussing that with our DWARF commitee members this morning. I showed them how we need some sort of extension to handle things that do not have fixed vector lengths like AMD SVE and RISC-V’s Vector extension. I had to go around 3 or 4 times but I think that I proved my point that we need something else or else you don’t know how big the vector size the hardware supports and so you don’t know how many loop interations you are executing at once and therefore how much data you pull out of the vector regs vs. what is still in memory. So my argument is that the 200720.1 will need to be revised before being accepted.

This is problematic for high-level parsers like Dyninst where that information isn't available until after the determination of array-ness has been made.

I would argue that this is a bit of a bug in dyninst. You really need to handle the vector bit of it before you handle the arrayness of it.

As for it being non-standard, I would agree that needs to change. I’m surprised that is not currently on the docket. It needs to be. I will mention that to our DWARF committee members and see if we can add that. The fact that DW_AT_GNU_Vector has been in use since 2002 gives it a pretty good track record. Therefore, convincing people that it needs to be assimilated in the standard shouldn’t be that hard.

Also, vectors aren't logically arrays, either; the AMD64 ABI treats them differently (not sure about the others). From a language perspective, they aren't the same because vectors don't possess the awful array-to-pointer decay. You can index them when using gcc, but I'm not sure if that's portable.

From all I know about vectors indexing works with some level of difficulty on pretty much everything. I see no reason why we can't just have a DW_TAG_vector_type that has a DW_AT_name, DW_AT_type, and a DW_AT_byte_size.

Write it up and sort of in the format of one of the other DWARF issues. I’ll see if I can sell that to any of the DWARF committee members to sponsor it. I don’t really disagree with you. However, I am not really convinced either.

I don’t think that the decay to pointer behavior vs. value passing in the psABI is going to be convincing to the DWARF committee. Or maybe I just haven’t heard a good enough argument yet. The other alternative is we know that Markus Metzger is going to have to revise 201007.1 to handle SVE. His proposal is very tight right now. Maybe we can get him to expand it and adopt the vector as a base type idea. Last time I checked he was still at intel.

-ben

[1] readelf.log

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

hainest commented 3 years ago

@woodard Ha! I'm not too surprised. gcc hasn't had the greatest history of doing floating-pointer vector transfers correctly. I'm glad to see that gcc-11 is now matching clang-12, if even by accident.

woodard commented 3 years ago

I talked to our DWARF representatives. The best advice that they have given me is to file issues regarding these things.

I will take that as my action item but I would like to collaborate with all of you to make sure that I end up making something that works for all of you.

1) The first issue I believe we should take up is to standardize DW_AT_GNU_VECTOR. Even if it is defined as having exactly the same number but with a new human readable string associated with it while it retains its meaning, making it official is something that we need to take up.

2) After that, I believe that pointing out the deficiencies in http://dwarfstd.org/ShowIssue.php?issue=200720.1 http://dwarfstd.org/ShowIssue.php?issue=200720.1 and http://dwarfstd.org/ShowIssue.php?issue=201007.1 http://dwarfstd.org/ShowIssue.php?issue=201007.1 for SVE and RISC-V Vector Extensions and explaining how debuggers should work with those would be important.

3) I have yet to come up with any good arguments for defining a new base type for vector registers. We should hash those out if you have any. However, since one of the goals is to be able to handle vector types and vectorized loops better within a debugger, we would need to work through the process of setting a breakpoint in a vectorized loop and then being able to extract the individual operands as if it were an unvectorized loop.

On Aug 30, 2021, at 2:51 PM, Tim Haines @.***> wrote:

@woodard https://github.com/woodard Ha! I'm not too surprised. gcc hasn't had the greatest history of doing floating-pointer vector transfers correctly. I'm glad to see that gcc-11 is now matching clang-12, if even by accident.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/buildsi/build-abi-tests/issues/8#issuecomment-908726181, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD4BGOF6PPOHAY7OP2NTH3T7P4PFANCNFSM5C6T5AUQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.