Generate database step taking too long to complete in specfem 4.1.1 GPU version

padesh commented 2 months ago

Hi everyone,

I recently updated specfem cartesion to 4.1.1 from 4.0.0 GPU version. In the new version, the database generation step is taking too much time as compared to previous version. I noticed most of the time is going in this step..

` ...setting up mesh adjacency

  mesh adjacency:
  total number of elements in this slice  =       126360

  maximum number of neighbors allowed     =          300
  minimum array memory required per slice =    145.089569     (MB)

  using kd-tree search radius             =    1050.00000    
           10  % - elapsed time:   59.2969856     s
           20  % - elapsed time:   118.541382     s
           30  % - elapsed time:   177.748108     s
           40  % - elapsed time:   236.961090     s
           50  % - elapsed time:   296.169128     s
           60  % - elapsed time:   355.404327     s
           70  % - elapsed time:   414.638702     s
           80  % - elapsed time:   473.861664     s
           90  % - elapsed time:   533.240967     s
          100  % - elapsed time:   592.426575     s

  maximum search elements                                      =          328
  maximum of actual search elements (after distance criterion) =          327

  estimated maximum element size            =    300.000000    
  maximum distance between neighbor centers =    984.885803    

  maximum neighbors found per element =          124
      (maximum neighbor of neighbors) =           98
  total number of neighbors           =     14527224

  Elapsed time for detection of neighbors in seconds =    593.135193    `

and this was not there in previous versions.

The same model run on CPU version takes about 10 sec for database generation using comparable compute.

Any advice how to speed it up?

Thanks.

danielpeter commented 1 month ago

right, this new adjacency array was added to improve the accuracy of source and receivers locations. the setup takes place in the mesher (xgenerate_databases) as you noted. given the mesh setup only needs to be called once for different source/receiver setups, this is a single hit.

however, it takes quite a bit of time to set it up especially if there are a lot of elements per slice as in your case. let me see if we can further speed it up for such large mesh slices...

in the meantime, in your case you can try to run the simulation with a higher number of NPROC to cut down the number of elements per slice and improve the parallelization of this mesh adjacency setup.

danielpeter commented 1 month ago

this has been addressed by PR #1743 - if you can, update to the devel branch version and see if the meshing is faster now for your setup.

code-cullison commented 1 month ago

@danielpeter: I had the same problem, thanks for fixing this. Maybe I've misunderstood, but are the source and receiver locations not independent of the databases/mesh (as long as the sources and receivers are in the spatial domain of the model)? In other words, If I change my source/receiver locations, should I run xgenerate_databases again?

On a side note, I've noticed that the solver will change my receiver locations to a position outside (above) my mesh -- I think this is related to Issue #1621. For example, my mesh in Z goes from 0m down to -255m (~5m cell size). When I set my receiver burial/depth to -PI (-3.14... m), the solver changes their location to a depth/burial of 2e-5 (postitive Z-coord). If I put the receiver depths to 0m or -5m (multiple of cell size) then the solver doesn't change the z-coord. This doesn't happen with the source (CMTSOLUTION).

padesh commented 1 month ago

@danielpeter

Yes, it is much faster now.

` mesh adjacency: total number of elements in this slice = 118584

  maximum number of neighbors allowed             =          300
  minimum array memory required per slice         =    136.160980     (MB)

  maximum number of elements per shared node      =            8
  node-to-element array memory required per slice =    235.257721     (MB)

           10  % - elapsed time:  0.596405625     s
           20  % - elapsed time:   1.14786458     s
           30  % - elapsed time:   1.70189250     s
           40  % - elapsed time:   2.25737548     s
           50  % - elapsed time:   2.82469821     s
           60  % - elapsed time:   3.39398766     s
           70  % - elapsed time:   3.96057320     s
           80  % - elapsed time:   4.51502466     s
           90  % - elapsed time:   5.06956339     s
          100  % - elapsed time:   5.54031801     s

  maximum neighbors found per element =           26
      (maximum neighbor of neighbors) =           98
  total number of neighbors           =     13616280

  Elapsed time for detection of neighbors in seconds =    6.20918417   `

Thanks for your prompt response and fix.

-Adesh

danielpeter commented 1 month ago

great, glad it works - thanks for the feedback :)

danielpeter commented 1 month ago

@danielpeter: I had the same problem, thanks for fixing this. Maybe I've misunderstood, but are the source and receiver locations not independent of the databases/mesh (as long as the sources and receivers are in the spatial domain of the model)? In other words, If I change my source/receiver locations, should I run xgenerate_databases again?

sorry, i should have rephrase that sentence. you need to run the mesher only once. there is no need to rerun the mesher when you change source/receiver positions. that's the whole point of separating mesher & solver for these SPECFEM simulations.

On a side note, I've noticed that the solver will change my receiver locations to a position outside (above) my mesh -- I think this is related to Issue #1621. For example, my mesh in Z goes from 0m down to -255m (~5m cell size). When I set my receiver burial/depth to -PI (-3.14... m), the solver changes their location to a depth/burial of 2e-5 (postitive Z-coord). If I put the receiver depths to 0m or -5m (multiple of cell size) then the solver doesn't change the z-coord. This doesn't happen with the source (CMTSOLUTION).

this sounds like a confusion about the input format of the receiver locations, in particular about the burial depth. details are described here in the wiki: https://github.com/SPECFEM/specfem3d/wiki/05_running_the_solver

note that burial depth is given in [m] and indicating depth, which is usually measured in negative Z-direction. therefore, if you set -PI as depth, it tries to locate the receiver above your surface. given you want the receiver buried below the surface, you would have +PI as burial depth.

code-cullison commented 1 month ago

Thank's @danielpeter. I have set 'USE_SOURCES_RECEIVERS_Z = .true.' and the mesh has a negative z-axis = (0,-255), so I thought that z1,z2 would also need to be negative.

SPECFEM / specfem3d

Generate database step taking too long to complete in specfem 4.1.1 GPU version #1742