Closed komatits closed 7 years ago
Thanks Dimitri!
I am looking forward to Etienne's implementation.
Best regards, Ebru
From: Dimitri Komatitsch [notifications@github.com] Sent: Tuesday, April 04, 2017 2:12 PM To: geodynamics/specfem3d Cc: Hatice E. Bozdag; Mention Subject: [geodynamics/specfem3d] in the adjoint sources arrays, split the time dependent source function from the fixed Lagrange interpolation arrays (#1008)
Done by Etienne @EtienneBachmannhttps://github.com/EtienneBachmann in the 2D code, he will do it in 3D and in 3D_GLOBE as well when he arrives in Princeton next month; very useful, this should solve most of the load imbalance we detected for Ebru's runs. Etienne and I analyzed that in detail today. Changing that will also allow for getting rid of the non-blocking I/O routines to read such big arrays (in file ./specfem3D/file_io_threads.c )
cc'ing @jeroentromphttps://github.com/jeroentromp @vmonthttps://github.com/vmont @danielpeterhttps://github.com/danielpeter @ebrubozdaghttps://github.com/ebrubozdag @schirwonhttps://github.com/schirwon
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/geodynamics/specfem3d/issues/1008, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFgpyu_dSsJ7fxuKcyjWfY4iLNqwg-Meks5rsogqgaJpZM4MzOx4.
Hi all, I just started to work on the problem. I looked the 3D code, and I guess regarding source, receivers and seismograms management this is the 2D code that should be mimicked. So aside splitting the arrays, we should also :
All these improvements will take some time, but I'll do my best to commit them quickly in the 3D cartesian.
Etienne
Hi Etienne,
It would be really great to do that. Very useful.
Best regards, Dimitri.
On 06/04/2017 10:55 PM, EtienneBachmann wrote:
Hi all, I just started to work on the problem. I looked the 3D code, and I guess regarding source, receivers and seismograms management this is the 2D code that should be mimicked. So aside splitting the arrays, we should also :
- Use Nsource_loc and Nrec_loc rather than the global Nsources and Nrec.
- Simplify "locate_source" and "locate_receivers" ( each 300lines in 2D when 1000lines in 3D) with some costy MPI communications to remove.
- In locate_receivers, the receiver location estimation depends on the kind of seismogram output one want (and is working bad for SU looking at issues #781 https://github.com/geodynamics/specfem3d/issues/781 and #672 https://github.com/geodynamics/specfem3d/issues/672 ), this does not make sense.
- Refactor the way adjoint sources are read ( remove duplicated code in compute_addsources and have a common routine, like in 2D). Must be refactored for GPU too (erasing compute_addsources_GPU routines). For seismograms :
- in write_seismograms, there are two routines to write the seismo depending on the simulation type, this does not make sense.
- this part of the code is difficult to read for each case of flag, and should be rewritten based on 2D version.
- with GPUs, seismo are still computed on CPUs with costy fields transfer at each time step, the code has to be fixed based on 2D way.
- the enhancement of I/O disk accesses should be done, like in 2D.
All these improvements will take some time, but I'll do my best to commit them quickly in the 3D cartesian.
Etienne
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/geodynamics/specfem3d/issues/1008#issuecomment-306066391, or mute the thread https://github.com/notifications/unsubscribe-auth/AFjDKUlWfWVG-24nD7RjjQL7ZfhooS26ks5sAxnbgaJpZM4MzOx4.
-- Dimitri Komatitsch, CNRS Research Director (DR CNRS) Laboratory of Mechanics and Acoustics, Marseille, France http://komatitsch.free.fr
And I also found another bug : Currently, the SU only output displacement (and only if SAVE_SEISMOGRAMS_DISPLACEMENT is true) , regardless to SAVE_SEISMOGRAMS_DISPLACEMENT,SAVE_SEISMOGRAMS_VELOCITY, SAVE_SEISMOGRAMS_ACCELERATION and SAVE_SEISMOGRAMS_PRESSURE flags. We also may give more explicit names for the file ? Currently it is proc_number_dx_SU_file, we should have proc_number_dx_SU_file, proc_number_vx_SU_file, proc_number_ax_SU_file and proc_number_px_SU_file respectively to flags.
Thanks! Yes, very good idea.
Dimitri.
On 06/07/2017 06:26 PM, EtienneBachmann wrote:
And I also found another bug : Currently, the SU only output displacement (and only if SAVE_SEISMOGRAMS_DISPLACEMENT is true) , regardless to SAVE_SEISMOGRAMS_DISPLACEMENT,SAVE_SEISMOGRAMS_VELOCITY, SAVE_SEISMOGRAMS_ACCELERATION and SAVE_SEISMOGRAMS_PRESSURE flags. We also may give more explicit names for the file ? Currently it is proc_number_dx_SU_file, we should have proc_number_dx_SU_file, proc_number_vx_SU_file, proc_number_ax_SU_file and proc_number_px_SU_file respectively to flags.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/geodynamics/specfem3d/issues/1008#issuecomment-306849177, or mute the thread https://github.com/notifications/unsubscribe-auth/AFjDKadUZQycDUNbrmDBmxGoNTjXMq8jks5sBs9EgaJpZM4MzOx4.
-- Dimitri Komatitsch, CNRS Research Director (DR CNRS) Laboratory of Mechanics and Acoustics, Marseille, France http://komatitsch.free.fr
Perfect, I'll work on it. Regarding splitting the arrays, the job is done in acoustics, I work on the elastic case now. Results look encouraging, with a run 25% faster in a small one MPI slice case (in acoustics so). Note that my new implementation will not decrease that much I/O with the disks (we still need to read all the data). But aside I/O with the disks, proceeding like this also avoids large I/O CPU <==> GPU at each time step, which are also quite costy, and even more when it will come to sets of receivers unequally distributed among MPI slices. Much less data have to be transmitted (NGLL3 times less). This may be the actual bottleneck that @ebrubozdag is experiencing.
Great, thanks! The improvements will be useful.
Regarding Ebru's runs there is also the imbalance of receiver distributions between MPI slices, see the attached histogram I created using her input files (basically dense arrays in some parts of the Earth, almost nothing in the oceans, but SPECFEM3D_GLOBE decomposes the surface of the cubed sphere evenly...). She is probably seeing a combination of both slowdowns. The second is not easy to fix (Malte was working on it).
Best regards, Dimitri.
On 06/07/2017 06:46 PM, EtienneBachmann wrote:
Perfect, I'll work on it. Regarding splitting the arrays, the job is done in acoustics, I work on the elastic case now. Results look encouraging, with a run 25% faster in a small one MPI slice case (in acoustics so). Note that my new implementation will not decrease that much I/O with the disks (we still need to read all the data). But aside I/O with the disks, proceeding like this also avoids large I/O CPU <==> GPU at each time step, which are also quite costly, and even more when it will come to sets of receivers unequally distributed among MPI slices. Much less data have to be transmitted (NGLL3 times less). This may be the actual bottleneck that @ebrubozdag https://github.com/ebrubozdag is experiencing.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/geodynamics/specfem3d/issues/1008#issuecomment-306855036, or mute the thread https://github.com/notifications/unsubscribe-auth/AFjDKSM7nx5FqEBFN-_dfGA1sgsSOqkSks5sBtP-gaJpZM4MzOx4.
-- Dimitri Komatitsch, CNRS Research Director (DR CNRS) Laboratory of Mechanics and Acoustics, Marseille, France http://komatitsch.free.fr
Hi all,
I have opened https://github.com/geodynamics/specfem3d_globe/issues/580
Cheers, Dimitri.
On 06/07/2017 07:17 PM, Dimitri Komatitsch wrote:
Great, thanks! The improvements will be useful.
Regarding Ebru's runs there is also the imbalance of receiver distributions between MPI slices, see the attached histogram I created using her input files (basically dense arrays in some parts of the Earth, almost nothing in the oceans, but SPECFEM3D_GLOBE decomposes the surface of the cubed sphere evenly...). She is probably seeing a combination of both slowdowns. The second is not easy to fix (Malte was working on it).
Best regards, Dimitri.
On 06/07/2017 06:46 PM, EtienneBachmann wrote:
Perfect, I'll work on it. Regarding splitting the arrays, the job is done in acoustics, I work on the elastic case now. Results look encouraging, with a run 25% faster in a small one MPI slice case (in acoustics so). Note that my new implementation will not decrease that much I/O with the disks (we still need to read all the data). But aside I/O with the disks, proceeding like this also avoids large I/O CPU <==> GPU at each time step, which are also quite costly, and even more when it will come to sets of receivers unequally distributed among MPI slices. Much less data have to be transmitted (NGLL3 times less). This may be the actual bottleneck that @ebrubozdag https://github.com/ebrubozdag is experiencing.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/geodynamics/specfem3d/issues/1008#issuecomment-306855036, or mute the thread https://github.com/notifications/unsubscribe-auth/AFjDKSM7nx5FqEBFN-_dfGA1sgsSOqkSks5sBtP-gaJpZM4MzOx4.
-- Dimitri Komatitsch, CNRS Research Director (DR CNRS) Laboratory of Mechanics and Acoustics, Marseille, France http://komatitsch.free.fr
And another remark : for acoustic adjoint sources, I added the division by kappa of the source term. The code was saying it was done constructing the adj_sourcearrays, but it was not done actually. This may have the consequence to modify the absolute value of calculated kernels. I guess that it should not be detected by nighty runs of buildbot, which probably only calculate elastic kernels.
Hi all,
There are still some bugs on the GPU version that I committed, I am working on it.
With the new commit, I completed all the goals I set. The speed up of the simulation is important : for 700 receivers, 1500 time steps in a 40X40X40 mesh, I get 80% faster in acoustics, and 65% faster in elastics, in both forward and adjoint simulations.
Once this new code will be tested and approved, we can close this issue.
Excellent. Thank you very much. Well done.
Dimitri.
On 06/09/2017 09:03 PM, EtienneBachmann wrote:
With the new commit, I completed all the goals I set. The speed up of the simulation is important : for 700 receivers, 1500 time steps in a 40X40X40 mesh, I get 80% faster in acoustics, and 65% faster in elastics, in both forward and adjoint simulations.
Once this new code will be tested and approved, we can close this issue.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/geodynamics/specfem3d/issues/1008#issuecomment-307473604, or mute the thread https://github.com/notifications/unsubscribe-auth/AFjDKWr1b-26PvKGG9r8pQPV741qPz7Pks5sCZcKgaJpZM4MzOx4.
-- Dimitri Komatitsch, CNRS Research Director (DR CNRS) Laboratory of Mechanics and Acoustics, Marseille, France http://komatitsch.free.fr
Done by Etienne @EtienneBachmann !
Done by Etienne @EtienneBachmann in the 2D code, he will do it in 3D and in 3D_GLOBE as well when he arrives in Princeton next month; very useful, this should solve most of the load imbalance we detected for Ebru's runs. Etienne and I analyzed that in detail today. Changing that will also allow for getting rid of the non-blocking I/O routines to read such big arrays (in file ./specfem3D/file_io_threads.c )
cc'ing @jeroentromp @vmont @danielpeter @ebrubozdag @schirwon