Open shyams2 opened 7 years ago
If shearing box boundary conditions need to be applied to a system having a shear flow along the y-direction, then the boundary condition that needs to be applied is:
f(x, y) = f(x + L_x, y - const * omega * L_x * time)
Standard periodic boundary conditions hold in the x-direction.
Now, for the y-direction, we have the issue of including time since the code is not aware of that.The time stepping loop is currently assembled at a user level. Any suggestions to tackle this?
Once that is fixed, the plan is to obtain value at (y - const omega L_x * time) using approx1
. However there is one issue. If the interpolation point is beyond the local scope of the sub-domain(in parallel runs) containing the boundary, it'll be treated as an off-grid point and would be assigned the default offgrid value(0). I haven't managed to think of a workaround for this yet.
An implementation has been carried out in 825ed8a The current idea for implementation in parallel is to allow the domain decomposition only along one particular axis(For example assigning nproc_in_q1 = 1 when shearing boundaries are applied along q2). This way there is no need for communications for the interpolation operations. However, the consequences of this sort of domain decomposition on performance need to be evaluated. This is the weak scaling test for the same:
Machine: GPU node on Savio @ Berkeley
Setup:
example_problems/nonrelativistic_boltzmann/test_shear
setupWeak-Scaling:
Preferentially increasing along q2:
Resolution | nGPUs | Time/Iteration |
---|---|---|
32 X 32 X 24^3 | 1 | 0.66705s |
32 X 64 X 24^3 | 2 | 0.77814s |
48 X 64 X 24^3 | 3 | 0.94574s |
64 X 64 X 24^3 | 4 | 1.02417s |
64 X 80 X 24^3 | 5 | 1.044385s |
Preferentially increasing along q1:
Resolution | nGPUs | Time/Iteration |
---|---|---|
32 X 32 X 24^3 | 1 | 0.71458s |
64 X 32 X 24^3 | 2 | 0.82067s |
64 X48 X 24^3 | 3 | 0.95697s |
64 X 64 X 24^3 | 4 | 1.02201s |
80 X 64 X 24^3 | 5 | 1.10472s |
Machine: GPU nodes on Savio @ Berkeley
Setup:
example_problems/nonrelativistic_boltzmann/test_shear
setupWeak-Scaling:
Resolution | nGPUs | Time/Iteration |
---|---|---|
6 X 256 X 24^3 | 1 | 1.73608s |
12 X 256 X 24^3 | 2 | 2.12663s |
18 X 256 X 24^3 | 3 | 1.914614s |
24 X 256 X 24^3 | 4 | 2.00052s |
30 X 256 X 24^3 | 5 | 2.01056s |
36 X 256 X 24^3 | 6 | 1.95032s |
42 X 256 X 24^3 | 7 | 1.93497s |
48 X 256 X 24^3 | 8 | 1.96161s |
54 X 256 X 24^3 | 9 | 2.33627s |
60 X 256 X 24^3 | 10 | 2.28044s |
66 X 256 X 24^3 | 11 | 3.11998s |
72 X 256 X 24^3 | 12 | 2.98348s |
78 X 256 X 24^3 | 13 | 2.98249s |
84 X 256 X 24^3 | 14 | 2.96902s |
90 X 256 X 24^3 | 15 | 2.97558s |
96 X 256 X 24^3 | 16 | 2.98609s |
102 X 256 X 24^3 | 17 | 3.01202s |
108 X 256 X 24^3 | 18 | 2.99003s |
114 X 256 X 24^3 | 19 | 3.01098s |
120 X 256 X 24^3 | 20 | 3.00246s |
126 X 256 X 24^3 | 21 | 3.07445s |
132 X 256 X 24^3 | 22 | 3.03415s |
138 X 256 X 24^3 | 23 | 3.03426s |
144 X 256 X 24^3 | 24 | 3.03600s |
150 X 256 X 24^3 | 25 | 3.08382s |
156 X 256 X 24^3 | 26 | 3.06304s |
162 X 256 X 24^3 | 27 | 3.07691s |
168 X 256 X 24^3 | 28 | 3.08605s |
174 X 256 X 24^3 | 29 | 3.10119s |
180 X 256 X 24^3 | 30 | 3.09036s |
186 X 256 X 24^3 | 31 | 3.11168s |
192 X 256 X 24^3 | 32 | 3.11164s |
198 X 256 X 24^3 | 33 | 3.19038s |
204 X 256 X 24^3 | 34 | 3.24442s |
210 X 256 X 24^3 | 35 | 3.15294s |
216 X 256 X 24^3 | 36 | 3.15491s |
222 X 256 X 24^3 | 37 | 3.17405s |
228 X 256 X 24^3 | 38 | 3.14879s |
234 X 256 X 24^3 | 39 | 3.15983s |
240 X 256 X 24^3 | 40 | 3.17015s |
246 X 256 X 24^3 | 41 | 3.18736s |
252 X 256 X 24^3 | 42 | 3.17365s |
258 X 256 X 24^3 | 43 | 3.18573s |
264 X 256 X 24^3 | 44 | 3.20118s |
Shearing box boundary conditions need to be implemented to be able to run the case of the magnetorotational instability.
This paper will be used as a reference. The possible route for implementation will be taken up in the next post.