Closed jlulh closed 1 week ago
interesting - is this an issue with SPECFEM or seisflows?
maybe you could provide a small SPECFEM example setup where you see different kernel values between CPU and GPU simulations. this would help to reproduce your issue.
Hi Daniel,
I hope this message finds you well. I’m glad to hear from you and apologize for my delayed response.
I have uploaded the specfem3D package I’ve been using to GitHub: https://github.com/jlulh/Specfem3d_test/. This version is based on the devel branch (a5bb135), and I made a few minor modifications to the following functions: compute_arrays_source.f90, write_output_SU.f90, compute_kernels.f90, and compute_kernels_hess_el_cudakernel.cu.
Additionally, I have included an example (model0050_test) that I used for testing. The MESH, as well as the true and initial model files, were all generated using xmeshfem3D. I tested the kernel of a shot dataset located in the model0050_test/scratch/solver/000000/ folder. You can modify the model0050_test/scratch/solver/000000/DATA/Par_file to set GPU_MODE=true or false, and then run the simulation with the command mpirun -np 1 ./bin/xspecfem3D. You will notice that the output files in OUTPUT_FILES/DATABASES_MPI have inconsistent *_kernel.bin results.
Please let me know if you have any questions or need further clarification.
Best regards
thanks for pointing out this inconsistency! there was indeed some differences between CPU and GPU versions in how the sources have been applied in your coupled-domain setup. PR #1759 should address and fix these.
I noted that you modified the SU adjoint source reading. in the PR, I incorporated a similar fix to be able to run the kernels with only the elastic adjoint source files (0_dx_SU.adj, ..) for this coupled acoustic/elastic domain setup.
also, you seem to have modified the Hessian kernel in file compute_kernels.f90
. note that the current SPECFEM3D version implements an approximate source-receiver Hessian kernel (multiplying accel() * b_accel()
), as compared to your source-source Hessian modification (b_accel() * b_accel()
). you would have to re-do that modification when pulling and trying out the new devel version (and the same with your SU header modification).
Thank you for the fix! I have tested the updated version, and the issue is resolved. I appreciate your help and efforts.
Description
I built the same model and inverted it using CPU and GPU respectively, and the computed kernel is very different. I tested the latest specfem3d version 4.1.1, old versions 4.1.0 and 4.0.0, and they both have problems.
Affected SPECFEM3D version
4.1.1(a5bb135), 4.1.0(89d1601) and 4.0.0(c97d521)
Your software and hardware environment
Ubuntu 22.04.4 LTS; gcc version 11.4.0; MPICH Version: 4.0; cpu: AMD EPYC 9684X; GPU: RTX4090;
Reproduction steps
Screenshots
No response
Logs
No response
OS
No response