Alpine-DAV / ascent

A flyweight in situ visualization and analysis runtime for multi-physics HPC simulations
https://alpine-dav.github.io/ascent/
Other
187 stars 63 forks source link

Slower than expected slice extracts #1283

Closed mlohry closed 2 months ago

mlohry commented 2 months ago

I'm running a single slice extracted to blueprint hdf5,

-
  action: "add_pipelines"
  pipelines:
    pl1:
      f1:
        type: "slice"
        params:
          point:
            x: 0.0
            y: 285.0
            z: 0.0
          normal:
            x: 0.0
            y: 1.0
            z: 0.0
-
  action: "add_extracts"
  extracts:
    e2:
      type: "relay"
      pipeline: "pl1"
      params:
        path: "./pipeline_pl1_hdf5"
        protocol: "blueprint/mesh/hdf5"

on an unstructured mesh with 60M tetrahedra. Running on 28 CPU cores, 1 rank per core, serial CPU backend, it's taking 26 seconds for a slice. The timings option shows most of the runtime in the extract stage:

0 source 0.000009
0 verify 0.002874
0 strip_garbage_ascent_ghosts 0.487456
0 default_queries_endpoint 0.000007
0 default_filters_endpoint 0.000003
0 pl1_f1_slice 2.929403
0 pl1 0.000004
0 e2 22.757614
0 [total] 26.17797
  1. Is this an expected runtime or has something gone wrong? By comparison slicing this same mesh in paraview on a laptop takes <1 second.
  2. The runtime takes the same amount of time each iteration, although none of the structure of the conduit data is changing. Is there any caching happening?

The timings for the full mesh extract with no pipeline is taking 8.6 seconds, also quite a bit slower than expected.

If instead of a blueprint extract, I run a render of the mesh, it takes 28.7 seconds, most in the "strip_real_ghosts_ascent_ghosts",

0 create_scene_s1 0.000006
0 source 0.000006
0 verify 0.002795
0 strip_garbage_ascent_ghosts 0.475188
0 default_queries_endpoint 0.000007
0 default_filters_endpoint 0.000004
0 pl1_f1_slice 2.928064
0 pl1 0.000004
0 pl1_strip_real_ghosts_ascent_ghosts 22.597618
0 pl1_plot_source 0.000004
0 s1_p1 0.002033
0 add_plot_s1_p1 0.000013
0 s1_p1_bounds 0.000722
0 s1_renders 0.001420
0 exec_s1 2.681735
0 [total] 28.690508

If I don't provide any ghost information, timing is about the same but now most is in the "s1_p1",

0 create_scene_s1 0.000006
0 source 0.000006
0 verify 0.002519
0 default_queries_endpoint 0.000005
0 default_filters_endpoint 0.000003
0 pl1_f1_slice 2.944006
0 pl1 0.000004
0 pl1_plot_source 0.000003
0 s1_p1 22.819038
0 add_plot_s1_p1 0.000013
0 s1_p1_bounds 0.005160
0 s1_renders 0.000441
0 exec_s1 3.431126
0 [total] 29.202944

Any idea? Could it have something to do with using local indexing on each rank?

mlohry commented 2 months ago

culprit: ascent and VTK-m both default to CMAKE_BUILD_TYPE=Release. conduit has no default CMAKE_BUILD_TYPE, so it was running without optimizations. 26.1s -> 1.27s.

cyrush commented 2 months ago

thanks for the report and details!