Open johnomotani opened 4 years ago
Providing optimised version seems like a good idea as %
is rather slow.
Are you really meaning nz
? Why would we assert nz>1
?
I'd be supportive of this in general. The branching in these routines mean that loops calling zp/zm won't be vectorised at the moment, one of the goals of guard cell based periodicity is to avoid some of this logic and enable vectorisation -- the zp/zm
routines would then look just like the xp
etc. versions.
I'm not sure how much this first line dealing with setting dz
will impact on performance though. I don't think there should be much cost in the typical case where dz <= nz
as we end up just saying dz = dz
and won't evaluate the dz % nz
branch. We can't remove the conditional in the return statement so I think removing the earlier check will not be sufficient to allow vectorisation. I think the main place we might hit the extra modulus is in the use of zpp/zmm
with Ind2D
where nz == 1
.
I was thinking we could check nz
for zpp
in case there is some special case if nz==1
or nz==2
. I think those cases both work OK though, because for nz==1
the result is always 0
and for nz==2
the result will be equal to ind%2
without changing dz
from 2 to 0.
There's a check on the size of the argument to
zp(int dz)
andzm(int dz)
https://github.com/boutproject/BOUT-dev/blob/67553247efd2bb5f7d171fcf0d241c95dc71407e/include/bout/region.hxx#L288-L292It's not used very much, but the commonest use is with no argument. Instead of having the default argument of
dz = 1
, would it be worth having a special no-argument case that does not need the check ondz
(as ifnz == 1
the result is always0
anyway)? And possibly another forzpp()
andzmm()
where we could have something likeASSERT3(nz >= 2);
instead of always checkingdz
?