Open aravindh-krishnamoorthy opened 10 months ago
I'm taking a look at this, but this has a C reproducer https://godbolt.org/z/bGPT4WxPe and GCC does a lot better
I'm taking a look at this, but this has a C reproducer https://godbolt.org/z/bGPT4WxPe and GCC does a lot better
Thank you @gbaraldi for taking this up.
For the C code, I'd just make the single change below. With the change, gcc is still good, but not as good as with the original UB. I feel it only packs the results in XMM registers instead of the register spills and reloads... and might also not be scalable?
struct wtf {
- int a[20];
+ int a[30];
};
Perhaps there's some hidden dependency that I'm not seeing? :(
Yeah, that was a typo, could you try incfreasing it to 50 or 100? Though i'm not surprised if at that size it gets so bad. I opened https://github.com/llvm/llvm-project/issues/78506 upstream
Yeah, that was a typo, could you try incfreasing it to 50 or 100? Though i'm not surprised if at that size it gets so bad. I opened llvm/llvm-project#78506 upstream
Thank for for raising the upstream issue, @gbaraldi. Actually, seems like it takes a lot to break GCC! This is the general process to check the assembly for any N:
$ awk -v N=50 -f rs.awk rs.awk > rs50.c
. For other sizes, please just change -v N=50
$ gcc -S -masm=intel -O3 -fverbose-asm rs50.c
I get a
awk: rs.awk: line 2: regular expression compile failed (missing operand)
?
awk: rs.awk: line 2: regular expression compile failed (missing operand)
?
error
I get a
awk: rs.awk: line 2: regular expression compile failed (missing operand) ? awk: rs.awk: line 2: regular expression compile failed (missing operand) ?
error
Sorry, I think you have a POSIX compatible awk
but this one uses extensions from GNU awk
. My version is as follows:
$ awk --version
GNU Awk 5.2.1, API 3.2, PMA Avon 8-g1, (GNU MPFR 4.2.0, GNU MP 6.2.1)
Copyright (C) 1989, 1991-2022 Free Software Foundation.
And the POSIX version fails here too:
$ awk -P -v N=10 -f rs.awk rs.awk
awk: rs.awk:2: error: Invalid preceding regular expression: /?/
my awk
doesn't even have a -- version
option :laughing: . But gawk
worked
my
awk
doesn't even have a-- version
option 😆 . Butgawk
worked
Sorry, seems like I got the for loop wrong. The correct version is for (i=0; i<N; i++)
, which starts from $i=0.$
$0 ~ /^##.*/ {gsub("^## ",""); print}
- $0 ~ /^#~.*/ {gsub("^#~ ",""); gsub("N",N); if($0 ~ /?/) {for (i=1; i<N; i++) {a = $0; gsub("?",i,a); print a}} else {print}}
+ $0 ~ /^#~.*/ {gsub("^#~ ",""); gsub("N",N); if($0 ~ /?/) {for (i=0; i<N; i++) {a = $0; gsub("?",i,a); print a}} else {print}}
## #include <stddef.h>
## struct wtf {
#~ int a[N];
## };
## struct wtf __attribute__ ((noinline)) foo(struct wtf *b, int i)
## {
## struct wtf new;
#~ int idx? = (? + i) % N ;
#~ int val? = b->a[idx?] ;
#~ new.a[?] = val? ;
## return new;
## }
##
I've also corrected it above.
This issue stems from PR #52438 and concerns the following function on my PC with Windows 11/WSL on Intel hardware (see below). But I suspect that the problem may also apply to other hardware types.
Consider the LLVM code generated for the following case where the parameter
shift
is a variable, i.e.,shift
without constant propagation:The generated LLVM code has the following structure:
Note that in the
ifelse
,load
, andnew
/store
portions, the instructions for all indices are bunched together. Next, consider the generated native code:The same structure is also carried forward into assembly (not shown here) despite there being no memory aliases! This significantly increases the register pressure and leads to a large number of register spills and reloads (8 bytes each). The spills and reloads degrade the performance of the function. Furthermore, the effects get worse as the size of the tuple increases.
ifelse
,load
, andstore
portions index by index.This topic was brought up on Julia Discourse where it was suggested that looking into new aliasing annotations for LLVM might be useful, see link for details.
Note 1: The case **with** constant propagation does not have this issue. The resulting code is excellent.
The LLVM and native code for this case can be obtained as follows: ```julia julia> g(x) = f(x, 1) g (generic function with 1 method) julia> @code_llvm g(x) [...] julia> @code_native g(x) [...] ```Note 2 ¹: Special solutions are sometimes found for some cases (integer tuples) but not for others.
The LLVM and native code for these case can be obtained as follows: ```julia julia> x = Tuple(collect(1:50)) (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50) julia> g(x,i) = f(x,i) g (generic function with 1 method) # LLVM code is Ok but deterministic. julia> @code_llvm g(x,1) [...] julia> x = Tuple(repeat(['a', -1, 1.0], 20)) ('a', -1, 1.0, 'a', -1, 1.0, 'a', -1, 1.0, 'a', -1, 1.0, 'a', -1, 1.0, 'a', -1, 1.0, 'a', -1, 1.0, 'a', -1, 1.0, 'a', -1, 1.0, 'a', -1, 1.0, 'a', -1, 1.0, 'a', -1, 1.0, 'a', -1, 1.0, 'a', -1, 1.0, 'a', -1, 1.0, 'a', -1, 1.0, 'a', -1, 1.0, 'a', -1, 1.0, 'a', -1, 1.0, 'a', -1, 1.0) # LLVM code again has the issue mentioned above. julia> @code_llvm g(x,1) [...] ```Note: unicode superscripts denote edits.