RRZE-HPC / stempel

Stencil TEMPlate Engineering Library
GNU Affero General Public License v3.0
6 stars 2 forks source link

Fused coefficient in symmetric-isotropic case #6

Closed christiealappatt closed 7 years ago

christiealappatt commented 7 years ago

Right now in the case of symmetric, isotropic stencils the coefficients are not fused i.e. for example python stempel.py -D 2 -r 1 -s -k star -C variable

output:

for(int j=2; j < M-2; j++){
for(int i=2; i < N-2; i++){
b[j][i] = W[j][i][0] * a[j][i]
+ W[j][i][1] * (a[j][i-1] + a[j][i+1])
+ W[j][i][1] * (a[j-1][i] + a[j+1][i])
+ W[j][i][2] * (a[j][i-2] + a[j][i+2])
+ W[j][i][2] * (a[j-2][i] + a[j+2][i])
;
}
}

I think it would be better if the output is:

for(int j=2; j < M-2; j++){
for(int i=2; i < N-2; i++){
b[j][i] = W[j][i][0] * a[j][i]
+ W[j][i][1] * (a[j][i-1] + a[j][i+1])
                    + a[j-1][i] + a[j+1][i])
+ W[j][i][2] * (a[j][i-2] + a[j][i+2])
                   + a[j-2][i] + a[j+2][i])
;
}
}

Since sometimes compiler might generate additional multiplies for the current case,

sguera commented 7 years ago

I see what you mean. Maybe @cod3monk can tell us how would kerncraft deal with it.

cod3monk commented 7 years ago

Kerncraft does not deal with it. It does not do any code transformations and leaves that part to the compiler. I have seen issues with this particular problem in a 3D boxed stencil. For kerncraft the solution is to fuse the coefficients manually, but since you are generating the code for kerncraft it might be useful to do that automatically.

sguera commented 7 years ago

Ok thanks for clarifying. Then I will fuse them while generating the code.

sguera commented 7 years ago

fixed in 3e57c7dcdae6422891c4054eb8ef6560155c4343 and 856049e55ca8e2e4a7857aee27a74215bd8879f0