Open kiranchandramohan opened 5 years ago
LLVM files generated by flang2 with and without the target attribute.
The problem is actually in flang1. Attached are 3 diffs for outputs from flang1:
Try the attached patch -- flang1-semfin.c-target-vector.patch.txt.
Thanks @flang-cavium for working on this. I will look into the patch that you provided today.
@flang-cavium , I can confirm that with this patch this loop is vectorized. This patch also helped to vectorize one of the application benchmarks which had this issue. I will now try to get this patch through some internal tests and also test with other applications which had similar vectorisation issues. Thanks once again for working on this.
@flang-cavium , I can confirm that with this patch this loop is vectorized. This patch also helped to vectorize one of the application benchmarks which had this issue.
That is a Good Thing[tm].
- TBAA information is not generated for the address load and the store to the array when compiled with optimizations (O3).
- When the target attribute is removed this TBAA information is generated.
These two details are related. The TARGET attribute, as we know, should only be put on variables that can be/will be pointed to by variables with a POINTER attribute. TARGET explicitly tells the compiler to be pessimistic as the variable may be aliased in some way. (Fortran has the opposite sense of C here; Fortran compilers can assume variables do not alias unless the programmer says otherwise.)
- The address load is possibly loop-invariant and can be hoisted out of the loop, but LLVM does not have enough information to do this.
Yes, this is why LLVM's vectorizer is having trouble. arr
is also ALLOCATABLE as well as a MODULE variable, so Flang is placing the object in a block of memory. The ALLOCATE statement will construct the various parts of arr
inside that block of memory, but the layout is not exposed to the LLVM optimizer. The block just looks like an array of bytes.
- This loop is vectorized by gfortran, ifort.
It is also vectorized by pgfortran.
Try the attached patch -- flang1-semfin.c-target-vector.patch.txt.
This patch gets around the aforementioned issues by throwing away the TARGET attribute on arr
(and somewhat by happenstance). It's generally incorrect to assume any variable with TARGET is not going to be aliased somehow, but the front-end will force this assumption if the variable with the TARGET attribute can only be referenced in the current scope and the variable is not pointer assigned in the current scope. But in this case, arr
is a global, so this local analysis is insufficient to prove there is no pointer aliasing in the general case.
Two approaches can be entertained to solve this problem correctly.
Do a global analysis that determines that all USE
s of arr
in the whole program do not introduce aliasing within a scope. (In general, this is a hard problem. Consider derived types that have polymorphic pointers, allocations that escape a scope, and separate compilation, for example.)
Decompose the "block of bytes" into an LLVM strongly typed product type and clean up all the references to the subcomponents to also be strongly typed. The goal would be for flang to do its own SROA-like decomposition and to not present memory accesses as bare i8* arithmetic. If I do these transformations by hand, this loop is fully unrolled and vectorized.
movaps .LCPI1_2(%rip), %xmm0 # xmm0 = [1,2,3,4]
movups %xmm0, _m1_0_-68(,%rax,4)
movaps .LCPI1_3(%rip), %xmm0 # xmm0 = [5,6,7,8]
movups %xmm0, _m1_0_-52(,%rax,4)
movaps .LCPI1_4(%rip), %xmm0 # xmm0 = [9,10,11,12]
movups %xmm0, _m1_0_-36(,%rax,4)
movaps .LCPI1_5(%rip), %xmm0 # xmm0 = [13,14,15,16]
movups %xmm0, _m1_0_-20(,%rax,4)
movaps .LCPI1_6(%rip), %xmm0 # xmm0 = [17,18,19,20]
movups %xmm0, _m1_0_-4(,%rax,4)
movaps .LCPI1_7(%rip), %xmm0 # xmm0 = [21,22,23,24]
movups %xmm0, _m1_0_+12(,%rax,4)
movaps .LCPI1_8(%rip), %xmm0 # xmm0 = [25,26,27,28]
movups %xmm0, _m1_0_+28(,%rax,4)
movaps .LCPI1_9(%rip), %xmm0 # xmm0 = [29,30,31,32]
movups %xmm0, _m1_0_+44(,%rax,4)
movaps .LCPI1_10(%rip), %xmm0 # xmm0 = [33,34,35,36]
movups %xmm0, _m1_0_+60(,%rax,4)
movaps .LCPI1_11(%rip), %xmm0 # xmm0 = [37,38,39,40]
movups %xmm0, _m1_0_+76(,%rax,4)
movaps .LCPI1_12(%rip), %xmm0 # xmm0 = [41,42,43,44]
movups %xmm0, _m1_0_+92(,%rax,4)
movaps .LCPI1_13(%rip), %xmm0 # xmm0 = [45,46,47,48]
movups %xmm0, _m1_0_+108(,%rax,4)
movaps .LCPI1_14(%rip), %xmm0 # xmm0 = [49,50,51,52]
movups %xmm0, _m1_0_+124(,%rax,4)
movaps .LCPI1_15(%rip), %xmm0 # xmm0 = [53,54,55,56]
movups %xmm0, _m1_0_+140(,%rax,4)
movaps .LCPI1_16(%rip), %xmm0 # xmm0 = [57,58,59,60]
movups %xmm0, _m1_0_+156(,%rax,4)
movaps .LCPI1_17(%rip), %xmm0 # xmm0 = [61,62,63,64]
movups %xmm0, _m1_0_+172(,%rax,4)
movaps .LCPI1_18(%rip), %xmm0 # xmm0 = [65,66,67,68]
movups %xmm0, _m1_0_+188(,%rax,4)
movaps .LCPI1_19(%rip), %xmm0 # xmm0 = [69,70,71,72]
movups %xmm0, _m1_0_+204(,%rax,4)
movaps .LCPI1_20(%rip), %xmm0 # xmm0 = [73,74,75,76]
movups %xmm0, _m1_0_+220(,%rax,4)
movaps .LCPI1_21(%rip), %xmm0 # xmm0 = [77,78,79,80]
movups %xmm0, _m1_0_+236(,%rax,4)
movaps .LCPI1_22(%rip), %xmm0 # xmm0 = [81,82,83,84]
movups %xmm0, _m1_0_+252(,%rax,4)
movaps .LCPI1_23(%rip), %xmm0 # xmm0 = [85,86,87,88]
movups %xmm0, _m1_0_+268(,%rax,4)
movaps .LCPI1_24(%rip), %xmm0 # xmm0 = [89,90,91,92]
movups %xmm0, _m1_0_+284(,%rax,4)
movaps .LCPI1_25(%rip), %xmm0 # xmm0 = [93,94,95,96]
movups %xmm0, _m1_0_+300(,%rax,4)
movaps .LCPI1_26(%rip), %xmm0 # xmm0 = [97,98,99,100]
movups %xmm0, _m1_0_+316(,%rax,4)
But in this case, arr is a global, so this local analysis is insufficient to prove there is no pointer aliasing in the general case.
Why is this correct for static arrays, which is precisely what the existing flang1 code does?
Why is pointer aliasing an inherent problem in the first place? The only case where pointer aliasing would be a problem would be if it was done through incompatible pointer types.
Does Fortran even allow pointer aliasing through incompatible pointer types in the first place?
Thanks @schweitzpgi for having a look and commenting in detail. Thanks @flang-cavium for the follow-up question.
I was trying to create a program which would show an issue with @flang-cavium's patch but I was not able to come up with one. My plan was to create an inter-iteration dependency due to aliasing and have the array with the target attribute and the pointer pointing it to be in separate modules like in the following code.
module m1
integer, Allocatable, Target, Dimension(:) :: arr
integer, parameter :: sz = 1000
end module
module m2
use m1
integer, pointer, Dimension(:) :: ptr
contains
subroutine setup()
allocate(arr(sz))
ptr = arr
end subroutine
end module
program mn
use m2
do i=2,sz
arr(i) = ptr(i-1) + i
end do
end program
@flang-cavium , I can confirm that with this patch this loop is vectorized. This patch also helped to vectorize one of the application benchmarks which had this issue. I will now try to get this patch through some internal tests and also test with other applications which had similar vectorisation issues. Thanks once again for working on this.
TBAA metadata doesn't get generated whenever there is a pointer assignment statement for the same array in the same routine. For example,
module m1
integer, Allocatable, Target, Dimension(:) :: arr
end module
program pgm
use m1
integer, pointer, Dimension(:) :: ptr
integer :: i = 2
ptr => arr ! <<--- Pointer assignment.
i = arr(i)
end program
In the above example, Flang doesn't produce TBAA info for arr(i) access! But once the assignment is commented out, the patch seems to work fine.
Hi Kiran,
The following code shows aliasing on array arr
.
module mod1
integer, allocatable, target, dimension(:) :: arr
integer, parameter :: size = 1000
contains
subroutine init_arr
allocate (arr(size))
do i = 1,size
arr(i) = i
end do
end subroutine init_arr
subroutine access_at(ptr, off)
integer off
integer, pointer :: ptr(:)
ptr => arr(off:)
end subroutine access_at
subroutine dump(size)
integer size
print *, "*** dump arr ***"
do i = 1, size
print *, i, arr(i)
end do
print *, ""
end subroutine dump
end module mod1
module mod2
contains
subroutine test1
use mod1
integer, pointer :: p(:)
integer, pointer :: q(:)
call init_arr
call access_at(p, 1)
call access_at(q, 2)
do i = 1, 200
q(i) = p(i) + 100
end do
call dump(50)
end subroutine test1
end module mod2
program p
use mod2
call test1
end program p
If we look at subroutine test1
and comment out that p
and q
have pointer attributes, the compiler could, for example, vectorize the loop in test1
. However, p
and q
do have the pointer attribute, so the compiler must be more conservative. It cannot vectorize the loop.
Now note that there is no difference between p
and arr
. Each is a name for the same array in storage. So, if we instead wrote the body of the loop in test1
as
q(i) = arr(i) + 100
the compiler should generate code that produces the same results. In this case, arr
has the target attribute (not a pointer), but it can and does alias with p
and q
. Therefore, it is clearly not safe for the compiler to remove the target attribute on the global/common symbol arr
.
Therefore, it is clearly not safe for the compiler to remove the target attribute on the global/common symbol arr.
And vectorization does not happen in either case:
q(i) = p(i) + 100
or
q(i) = arr(i) + 100
--- !Analysis
Pass: loop-vectorize
Name: CantIdentifyArrayBounds
DebugLoc: { File: t4.f90, Line: 39, Column: 1 }
Function: mod2_test1_
Args:
- String: 'loop not vectorized: '
- String: cannot identify array bounds
...
--- !Missed
Pass: loop-vectorize
Name: MissedDetails
DebugLoc: { File: t4.f90, Line: 39, Column: 1 }
Function: mod2_test1_
Args:
- String: loop not vectorized
...
Thanks @schweitzpgi for your comments and the example. After going through the example that you provided and I agree with your argument that q and arr should not alias and hence the target attribute on arr should not be removed.
@flang-cavium I also do not see any vectorisation. But this is probably due to other issues. Isn't @schweitzpgi 's point that telling LLVM that "q" and "arr" do not alias in "q(i) = arr(i) + 100" incorrect?
Thanks @mvinay for your comment. Yes, I did observe that having "arr" on the RHS of a pointer assignment prevents the target attribute from being dropped. I wanted to try the case (as in the example i posted in a comment above) where the pointer assignment happens in a different scope.
But this is probably due to other issues.
Which other issues?
Isn't @schweitzpgi 's point that telling LLVM that "q" and "arr" do not alias in "q(i) = arr(i) + 100" incorrect?
LLVM figured out on its own that "q" and "arr" should not alias in "q(i) = arr(i) + 100", so I don't see how @schweitzpgi's point would be relevant. It may be correct in theory, but not relevant.
Pointer aliasing rules. A construct like the one in @schweitzpgi's example won't alias in C either.
The code given below contains a module with an allocatable variable having the target attribute set. This module variable is used inside the loop and the loop is not vectorized by Flang/LLVM.
Observations 1) TBAA information is not generated for the address load and the store to the array when compiled with optimizations (O3). 2) When the target attribute is removed this TBAA information is generated. 3) The address load is possibly loop-invariant and can be hoisted out of the loop, but LLVM does not have enough information to do this. 4) This loop is vectorized by gfortran, ifort.