another idea to improve adjoint run speed is to merge the GPU kernels in the compute_kernels routine

From Etienne @EtienneBachmann :

Another idea to improve adjoint run speed is to merge the GPU kernels in the compute_kernels routine, where rho kernels and other kernels are separated. It should not affect the readability of the code. Also, it can be wise to introduce a flag COMPUTE_RHO_KERNELS, to avoid their calculation in case they are not needed. The associated computational cost is quite important in the case of acoustic (here outer core), because of the call to compute_gradient routines. To give an idea, on pure acoustic simulation, I obtain a 25% speedup on my purely acoustic adjoint simulation just by commenting the calculation of the rho acoustic kernel. I'm not specialist of the large runs on cluster, but I suspect that even in a perfectly balanced mesh, calibrated to run with more acoustic elements than elastic because of the compute forces routine, when it comes to kernels computation, the acoustic rho kernel calculation becomes an important bottleneck that slows down the whole simulation.

Best regards,

Etienne

SPECFEM / specfem3d

another idea to improve adjoint run speed is to merge the GPU kernels in the compute_kernels routine #1146