SPECFEM3D_Cartesian simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra (structured & unstructured).
Another idea to improve adjoint run speed is to merge the GPU kernels in the compute_kernels routine, where rho kernels and other kernels are separated. It should not affect the readability of the code.
Also, it can be wise to introduce a flag COMPUTE_RHO_KERNELS, to avoid their calculation in case they are not needed. The associated computational cost is quite important in the case of acoustic (here outer core), because of the call to compute_gradient routines. To give an idea, on pure acoustic simulation, I obtain a 25% speedup on my purely acoustic adjoint simulation just by commenting the calculation of the rho acoustic kernel. I'm not specialist of the large runs on cluster, but I suspect that even in a perfectly balanced mesh, calibrated to run with more acoustic elements than elastic because of the compute forces routine, when it comes to kernels computation, the acoustic rho kernel calculation becomes an important bottleneck that slows down the whole simulation.
Hi all,
If you do that please first talk to Vadim @vmont because he has also worked on the GPU kernels recently and he uses them in their current format.
Thanks,
Dimitri.
From Etienne @EtienneBachmann :
Another idea to improve adjoint run speed is to merge the GPU kernels in the compute_kernels routine, where rho kernels and other kernels are separated. It should not affect the readability of the code. Also, it can be wise to introduce a flag COMPUTE_RHO_KERNELS, to avoid their calculation in case they are not needed. The associated computational cost is quite important in the case of acoustic (here outer core), because of the call to compute_gradient routines. To give an idea, on pure acoustic simulation, I obtain a 25% speedup on my purely acoustic adjoint simulation just by commenting the calculation of the rho acoustic kernel. I'm not specialist of the large runs on cluster, but I suspect that even in a perfectly balanced mesh, calibrated to run with more acoustic elements than elastic because of the compute forces routine, when it comes to kernels computation, the acoustic rho kernel calculation becomes an important bottleneck that slows down the whole simulation.
Best regards,
Etienne