boutproject / BOUT-dev

BOUT++: Plasma fluid finite-difference simulation code in curvilinear coordinate systems
http://boutproject.github.io/
GNU Lesser General Public License v3.0
183 stars 95 forks source link

Optimizing SpecificInd::zp(), SpecificInd::zm()? #1857

Open johnomotani opened 4 years ago

johnomotani commented 4 years ago

There's a check on the size of the argument to zp(int dz) and zm(int dz) https://github.com/boutproject/BOUT-dev/blob/67553247efd2bb5f7d171fcf0d241c95dc71407e/include/bout/region.hxx#L288-L292

It's not used very much, but the commonest use is with no argument. Instead of having the default argument of dz = 1, would it be worth having a special no-argument case that does not need the check on dz (as if nz == 1 the result is always 0 anyway)? And possibly another for zpp() and zmm() where we could have something like ASSERT3(nz >= 2); instead of always checking dz?

dschwoerer commented 4 years ago

Providing optimised version seems like a good idea as % is rather slow.

Are you really meaning nz? Why would we assert nz>1?

d7919 commented 4 years ago

I'd be supportive of this in general. The branching in these routines mean that loops calling zp/zm won't be vectorised at the moment, one of the goals of guard cell based periodicity is to avoid some of this logic and enable vectorisation -- the zp/zm routines would then look just like the xp etc. versions.

I'm not sure how much this first line dealing with setting dz will impact on performance though. I don't think there should be much cost in the typical case where dz <= nz as we end up just saying dz = dz and won't evaluate the dz % nz branch. We can't remove the conditional in the return statement so I think removing the earlier check will not be sufficient to allow vectorisation. I think the main place we might hit the extra modulus is in the use of zpp/zmm with Ind2D where nz == 1.

johnomotani commented 4 years ago

I was thinking we could check nz for zpp in case there is some special case if nz==1 or nz==2. I think those cases both work OK though, because for nz==1 the result is always 0 and for nz==2 the result will be equal to ind%2 without changing dz from 2 to 0.