QuazarTech / Bolt

A Fast Solver Framework For Kinetic Theories
GNU General Public License v3.0
7 stars 5 forks source link

Implementation of shearing box boundary conditions. #33

Open shyams2 opened 7 years ago

shyams2 commented 7 years ago

Shearing box boundary conditions need to be implemented to be able to run the case of the magnetorotational instability.

This paper will be used as a reference. The possible route for implementation will be taken up in the next post.

shyams2 commented 7 years ago

If shearing box boundary conditions need to be applied to a system having a shear flow along the y-direction, then the boundary condition that needs to be applied is:

f(x, y) = f(x + L_x, y - const * omega * L_x * time)

Standard periodic boundary conditions hold in the x-direction.

Now, for the y-direction, we have the issue of including time since the code is not aware of that.The time stepping loop is currently assembled at a user level. Any suggestions to tackle this?

Once that is fixed, the plan is to obtain value at (y - const omega L_x * time) using approx1. However there is one issue. If the interpolation point is beyond the local scope of the sub-domain(in parallel runs) containing the boundary, it'll be treated as an off-grid point and would be assigned the default offgrid value(0). I haven't managed to think of a workaround for this yet.

shyams2 commented 6 years ago

An implementation has been carried out in 825ed8a The current idea for implementation in parallel is to allow the domain decomposition only along one particular axis(For example assigning nproc_in_q1 = 1 when shearing boundaries are applied along q2). This way there is no need for communications for the interpolation operations. However, the consequences of this sort of domain decomposition on performance need to be evaluated. This is the weak scaling test for the same:

Machine: GPU node on Savio @ Berkeley

Setup:

Weak-Scaling:

Preferentially increasing along q2:

Resolution nGPUs Time/Iteration
32 X 32 X 24^3 1 0.66705s
32 X 64 X 24^3 2 0.77814s
48 X 64 X 24^3 3 0.94574s
64 X 64 X 24^3 4 1.02417s
64 X 80 X 24^3 5 1.044385s

Preferentially increasing along q1:

Resolution nGPUs Time/Iteration
32 X 32 X 24^3 1 0.71458s
64 X 32 X 24^3 2 0.82067s
64 X48 X 24^3 3 0.95697s
64 X 64 X 24^3 4 1.02201s
80 X 64 X 24^3 5 1.10472s
shyams2 commented 6 years ago

Machine: GPU nodes on Savio @ Berkeley

Setup:

Weak-Scaling:

Resolution nGPUs Time/Iteration
6 X 256 X 24^3 1 1.73608s
12 X 256 X 24^3 2 2.12663s
18 X 256 X 24^3 3 1.914614s
24 X 256 X 24^3 4 2.00052s
30 X 256 X 24^3 5 2.01056s
36 X 256 X 24^3 6 1.95032s
42 X 256 X 24^3 7 1.93497s
48 X 256 X 24^3 8 1.96161s
54 X 256 X 24^3 9 2.33627s
60 X 256 X 24^3 10 2.28044s
66 X 256 X 24^3 11 3.11998s
72 X 256 X 24^3 12 2.98348s
78 X 256 X 24^3 13 2.98249s
84 X 256 X 24^3 14 2.96902s
90 X 256 X 24^3 15 2.97558s
96 X 256 X 24^3 16 2.98609s
102 X 256 X 24^3 17 3.01202s
108 X 256 X 24^3 18 2.99003s
114 X 256 X 24^3 19 3.01098s
120 X 256 X 24^3 20 3.00246s
126 X 256 X 24^3 21 3.07445s
132 X 256 X 24^3 22 3.03415s
138 X 256 X 24^3 23 3.03426s
144 X 256 X 24^3 24 3.03600s
150 X 256 X 24^3 25 3.08382s
156 X 256 X 24^3 26 3.06304s
162 X 256 X 24^3 27 3.07691s
168 X 256 X 24^3 28 3.08605s
174 X 256 X 24^3 29 3.10119s
180 X 256 X 24^3 30 3.09036s
186 X 256 X 24^3 31 3.11168s
192 X 256 X 24^3 32 3.11164s
198 X 256 X 24^3 33 3.19038s
204 X 256 X 24^3 34 3.24442s
210 X 256 X 24^3 35 3.15294s
216 X 256 X 24^3 36 3.15491s
222 X 256 X 24^3 37 3.17405s
228 X 256 X 24^3 38 3.14879s
234 X 256 X 24^3 39 3.15983s
240 X 256 X 24^3 40 3.17015s
246 X 256 X 24^3 41 3.18736s
252 X 256 X 24^3 42 3.17365s
258 X 256 X 24^3 43 3.18573s
264 X 256 X 24^3 44 3.20118s

plot