[DR] Extensions to Vlasov and PKPM in the dg_10m branch.

JunoRavin commented 5 months ago

A number of extensions to the Vlasov and PKPM system have been explored in the https://github.com/ammarhakim/gkylzero/tree/dg_10m branch that have yielded new functionality and improvements to robustness and are ready for design review.

The new features are:

Both the DG fluids functionality in the Vlasov app and the PKPM app now support a first-order operator split + semi-implicit source solve as described in Wang et al. 2020 JCP (https://ammar-hakim.org/sj/_static/files/JCP-2020-Wang.pdf). This algorithm has been extended to our modal DG formalism, in which the matrix for how each modal DG coefficient evolves is constructed following the equations for the linear system (analogous to weak division or the primitive moment solve for the conservative LBO), and then the matrix is inverted with our batched linear solve routines. The standard plasma-beach problem (https://ammar-hakim.org/sj/je/je8/je8-plasmabeach.html) runs with a time step set by the speed of light instead of the plasma frequency in the high density region. The PKPM version has a slightly different cutoff due to the presence of a background Bx (to define the vparallel direction) but nevertheless the launched plasma wave becomes evanescent as expected and the stable time step restriction is relaxed. The new Alfven soliton regression test based on Kakutani & Ono 1969 also stably runs with dx ~ 32 di (where di is the ion inertial length) and dt ~ omega_ci^{-1}/4 (where omega_ci is the ion cyclotron frequency).
The canonical Poisson bracket infrastructure has been extended so that the Hamiltonian is constructed utilizing the form H = 1/2 g^ij w_i w_j, where g^ij is the contravariant components of the metric tensor. This formalism allows simulations of motion on the surface of a sphere, and the vlasov_lte_proj_on_basis routine has been extended to handle this functionality (including the necessary metric components in the exponential). The vlasov_lte_moments routine has also been extended to obtain the correct pressure given the input pressure tensor.
The same first-order operator split which is utilized for the implicit fluid-EM coupling can also be utilized for an implicit BGK collision operator. Simulations which take time steps 1k-10k times larger than the collisional time have been performed.
The PKPM fluid system now utilizes the wv_eqn ten_moment object to perform the Roe solve at the interface (at quadrature points in higher dimensions). This switch has reduced the oscillations significantly in a number of production problems, and simulations which were previously unstable, such as a PKPM firehose instability simulation, now stably run.
There is now the capability to apply the characteristic limiter utilized by dg_euler for both dg_maxwell and dg_euler_pkpm (which utilizes the wv_eqn ten_moment object to obtain the characteristics being limited).

To accomplish these new features the following refactorings have been performed:

Both the Vlasov and PKPM apps now set a function pointer to an update_func which takes the input app and dt and returns the status object for whether a time step was successful. This function pointer assignments allows us to switch between a pure explicit SSP RK3 (for explicit collisions or fluid sources if we know we resolve those scales and desire a higher order time integrator) and the first-order operator split that can implicit treat the fluid-EM coupling or BGK collisions.
wv_maxwell has been extended to take an input use_gpu flag in case we want to limit Maxwell's equations on device (note that wv_ten_moment has not yet been extended this way)
Since the implicit fluid-EM coupling includes the contributions from external applied accelerations, fields, and currents, these components of the app have been refactored to firstly always be allocated (so that the implicit solve just adds 0s when constructing the matrix, in exact analogy with how moment_em_coupling handled these terms). I also standardized the names (accel -> app_accel to emphasize it is an applied acceleration) and eliminated some helper functions which are only necessary if you use the gkyl_array_accumulate methods (instead of gkyl_array_accumulate_range).
I apply the limiters after the forward Euler call but before the RK combine steps, so that the limiter is a genuinely post-hoc characteristic limiter (diffusing the slopes after a time step if the slopes are too steep). This order of operations also avoids applying the limiters after implicit fluid-EM coupling, which I believe to be undesirable because the implicit solve is constructed to conserve E^2 + rhou^2 (and there should not be spuriously large slopes generated because it is a linear solve).

As part of this design exploration a number of things were tried that it is worth noting what worked better or worse:

The flux Jacobian for the ten moment system is most easily constructed for the ten moment system in primitive variables (https://gkyl.readthedocs.io/en/latest/dev/tenmom-eigensystem.html). As such, the very first thing that is done in the Roe solve is to compute the primitive moments of the ten moment system [rho, ux, uy, uz, Pxx, Pxy, Pxz, Pyy, Pyz, Pzz]. I found that it worked better to use the genuine primitive moments at the quadrature points for this evaluation (by better I mean that dispersive errors were further minimized and there were fewer oscillations in the resulting solution). So instead of evaluating rhou or Pxx + rhou^2 at the quadrature point, I do: q_lr[0] = ser_2x_p1_surfx1_eval_quad_node_0_r(rho_l); q_lr[1] = q_lr[0](0.7071067811865475ux_surf_lr[0]-0.7071067811865475ux_surf_lr[1]); q_lr[2] = q_lr[0](0.7071067811865475uy_surf_lr[0]-0.7071067811865475uy_surf_lr[1]); q_lr[3] = q_lr[0](0.7071067811865475uz_surf_lr[0]-0.7071067811865475uz_surf_lr[1]); q_lr[4] = ser_2x_p1_surfx1_eval_quad_node_0_r(Pxx_l) + q_lr[1]q_lr[1]/q_lr[0]; q_lr[5] = ser_2x_p1_surfx1_eval_quad_node_0_r(Pxy_l) + q_lr[1]q_lr[2]/q_lr[0]; q_lr[6] = ser_2x_p1_surfx1_eval_quad_node_0_r(Pxz_l) + q_lr[1]q_lr[3]/q_lr[0]; q_lr[7] = ser_2x_p1_surfx1_eval_quad_node_0_r(Pyy_l) + q_lr[2]q_lr[2]/q_lr[0]; q_lr[8] = ser_2x_p1_surfx1_eval_quad_node_0_r(Pyz_l) + q_lr[2]q_lr[3]/q_lr[0]; q_lr[9] = ser_2x_p1_surfx1_eval_quad_node_0_r(Pzz_l) + q_lr[3]*q_lr[3]/q_lr[0];

so that when the ten moment primitive variable solve is called that divides by rho or subtracts the Reynolds stress, what is returned to me is exactly the evaluation of the primitive variable at that quadrature point.

Initially, I attempted a 2nd order Strang split exp(dt/2 L_I) exp(dt L_H) exp (dt/2 L_I) where L_I is the implicit update and L_H is the hyperbolic update. However, this construction proved troublesome for two reasons. First, the way our SSP RK3 method works is that it attempts to take the largest possible time step, computing the cfl from the characteristics before the forward Euler step is complete. As such, the total time step when dt is changing dynamically (especially when it is increasing) would have to be completely re-done for the RK3 method to avoid the semi-implicit method taking a smaller time step and things becoming desynchronized in time. Further, even when I ran with fixed size time step (such as a time step set by the speed of light), I was only getting first order convergence in quantities such as total energy conservation. I judged this to be something subtle with how a multi-step method interacts with an operator split (compared to our multi-fluid one-step method) and decided that first order operator splits with the implicit step taking a time step of size dt_actual (after the SSP RK3 method) to be preferable to attempting to rewrite things. If one needs higher accuracy in the sources, they can do highly resolved calculations with use_explicit_sources = true.

JonathanGorard commented 5 months ago

Generally I'm very happy with these modifications - my principal concern regards the changes to the handling of sources, and how these changes interact with (or, at the very least, inform) our plans for sources in moments and other places, especially in light of plans for the multi-block infrastructure. I very much like the use of an update_func function pointer as part of the input app for selecting which source solver to use, and I believe that, unless we have a good reason not to, something very much like this is how we should handle source solver selection across the rest of the code.

But it seems to me, unless I'm missing something, that our choice of implicit, semi-implicit, or fully explicit (SSP RK-3 or similar) source solver should essentially be independent of our choice of splitting (1st order, 2nd order Strang, etc.). It's arguable that we should really be having two function pointers (e.g. update_func, which contains a pointer to the basic integrator(s), and split_func, which assembles those integrators together in some arbitrary way on each time-step). Looking ahead to the multi-block infrastructure, we probably want individual child blocks to have pointers to individual integrators (including the hyperbolic time-stepper, etc.), which the parent block can then decide (based on a function pointer of its own) how to assemble together into a single forward time-step across the entire domain, so as to facilitate situations like "each child block takes a hyperbolic time-step separately, then apply a forward-Euler step across the entire block domain", etc., which would not be possible within the current design.

I don't think that any immediate modification to the code in the dg_10m branch is required, but this new (provisional) design should probably be borne in mind when making any further changes to, or developing any further infrastructure upon, the current source solver(s).

ammarhakim commented 5 months ago

Comments about Strang splitting:

Strang-splitting and Gudonov splitting are identical, except staggered by a dt/2. So it is fine to do what you have done
The 1st order convergence is due to this also: to do this properly, you need to take a first step by dt/2 and then the subsequent steps need to be done with dt. Then, the convergence you will measure is 2nd order. (see comments at https://ammar-hakim.org/sj/je/je6/je6-maxwell-solvers.html#convergence-of-the-fdtd-scheme)

One confusion we are creating is that the wave-prop in general does not work on GPUs. Perhaps having only specific equations working on GPUs is fine, though, as long as all functions on an eqn that takes use_gpu are really on the GPU, and not only 1.

Going forward it is best to the DR first, specially for such changes.

@JunoRavin please submit the PR for the branch.

JunoRavin commented 4 months ago

Design review merged with https://github.com/ammarhakim/gkylzero/pull/429

ammarhakim / gkylzero

[DR] Extensions to Vlasov and PKPM in the dg_10m branch. #386