LLNL / conduit

Simplified Data Exchange for HPC Simulations
https://software.llnl.gov/conduit/
Other
205 stars 63 forks source link

extend Blueprint index to support AMR #476

Open xjrc opened 4 years ago

xjrc commented 4 years ago

In order to support AMR and other problems with nontrivial rank-domain distributions, the Blueprint index needs to be extended to include sparser and more descriptive domain metadata. More specifically, the rank/file-domain ownership relationship needs to be made more explicit and the ability to flexibly remove/exclude domains (e.g. those uninvolved in a particular segment of a calculation) needs to be added.

In order to accomplish these aims, the schema for the Blueprint index needs to be updated as follows:

Current Schema

{
    "number_of_files": /* int */, // number of owner files 
    "number_of_trees": /* int */, // number of domain trees per file
    "file_pattern": /* string */, // template for owner file strings
    "tree_pattern": /* string */, // template for domain tree paths
}

New Schema

{
    "domain_index": [ /* int */ ], // map of domain id to domain sparse index
    "domain_owner_map": [ /* int */ ], // map of domain sparse index to rank/file id
    "owner_pattern": /* string */, // template for owner rank/file strings
    "tree_pattern": /* string */, // template for domain tree paths
}

Example

Problem

rX := rank with id X
dY := domain with id Y

|===================|===================|===================|
|         r0        |        r1         |        r2         |
|===================|===================|===================|
|  +----+   +----+  |  +----+           |  +----+   +----+  |
|  | d0 |   | d1 |  |  | d3 |           |  | d5 |   | d9 |  |
|  +----+   +----+  |  +----+           |  +----+   +----+  |
|===================|===================|===================|

Blueprint

{
    "domain_index": [ 0, 1, -1, 2, -1, 3, -1, -1, -1, 4 ],
    //                |  |      |      |              |
    //                |  +---+  |  +---+              |
    //                +---v  v  v  v  v---------------+
    "domain_owner_map": [ 0, 0, 1, 2, 2 ],
    "owner_pattern": "r%05d",
    "tree_pattern": "problem/domains",
}

As can be seen in the example above, domain_index is effectively just an indirection array from the space of all domain IDs to a sparse list of just the valid domains for the problem. This "sparse index" is used to index into all other per-domain quantities, of which domain_owner_map is one example (where the quantity is the ID of the owning entity). Other future examples of using this sparse index include as an index into per-domain boundaries for quantities like field values and spatial extents (e.g. for the example above, the index may include "pressure_min": [ 0.1, 0.2, 0.3, 0.4, 0.5 ] and "pressure_max": [ 1.0, 2.0, 3.0, 4.0, 5.0 ] somewhere in the hierarchy).

cyrush commented 4 years ago

Running into an interesting case where topologies have different domain decompositions.

From this case I think it makes sense to have the blueprint index identify decomposition for each topology. Need to ponder more, there could be some case where this has downsides.

nselliott commented 4 years ago

@cyrush Can you expand on what you mean by "topologies have different domain decompositions"?

cyrush commented 4 years ago

two topologies, one has 4 domains and the other has 2. Right now there is only one index, so it's not possible to provide two mappings.

nselliott commented 4 years ago

Maybe a domain_topo_map that looks like the domain_owner_map? But then the topologies would need integer IDs. Or the topology in the index could contain a list of its associated domain IDs.

This could possibly be optional, since not every case will need this.

cyrush commented 4 years ago

yes, i think simply having a domain map optionally under the topologies in the blueprint_index a solution.

nselliott commented 4 years ago

I want to make sure I clearly understand how "owner_pattern" and "tree_pattern" are supposed to work to construct paths in the new schema. For the old schema, I gave "treepattern" a string with a printf-style format specifier such as `"domain%06d"`. The integer value gets filled with the rank, and that works for a single domain per rank application. The example here for the new schema doesn't use a format specifier for "tree_pattern", but should it?

As an example, I think my blueprint mesh schema for my application "myapp" with multiple domains per rank should look something like this. Imagine the simplest multi-domain problem, one rank with two domains.


myapp
{
   mesh
   {
      domain_000000
      {
         coordsets
         {
            coords { ... }
         }
         topologies
         {
            topo { ... }
         }
         fields
         ...etc
      }
      domain_000001
      {
         coordsets
         {
            coords { ... }
         }
         topologies
         {
            topo { ... }
         }
         fields
         ...etc
      }
   }
}

Then then blueprint index would look like

blueprint_index
{
   domain_index:  [0,1]
   domain_rank_map:  [0,0]
   owner_pattern:  rank_%06d
   tree_pattern:  "myapp/mesh/domain_%06d"

   state
   {
      cycle:  0
      time:  0.0
      number_of_domains:  2
   }
   myapp
   {
      coordsets
      {
         coords
         {
            path:  "coordsets/coords"
         }
      }
      ... paths for topologies, fields would look similar
   }
}

Does this look right? A tool reading this would concatenate "tree_pattern" and "path" to figure out the full path to "coords" in the mesh schema?

Is it correct to put to have "domain_index", etc. directly under "blueprint_index", or should they be nested inside "myapp"?

How is "owner_pattern" used? Should it be used to name an object that encompasses the entire mesh schema, like this?

rank_000000
{
   myapp
   {
      mesh
      {
          domain_000000 {...}
          domain_000001 {...}
      }
   }
}

Also, don't we still need "file_pattern"? If it isn't there, how does a tool that reads the index know where to find files?

I ended up with a lot of questions here but I hope this made sense.

cyrush commented 3 years ago

@xjrc -- does it make sense to revisit the sparse case in light of O2M now existing? might not directly apply, but we have pondered much more.

cyrush commented 3 years ago

@nselliott Sparse case doesn't matter for your use cases right now correct? I am pretty sure sparse was driven by me related to viz use cases.

My proposal: let's aim for a "per-mesh" mapping to hash out that case. I'll post some ideas soon.

nselliott commented 3 years ago

@cyrush No, I don't need the sparse case.

xjrc commented 3 years ago

@cyrush: The rank-domain distribution certainly follows a one-to-many relationship pattern (i.e. each rank (one) contains some number of domains (many)), so I think it makes sense to apply the one-to-many schema to this piece of index data. Such an application may look like the following when applied to this issue's primary example:

Problem

rX := rank with id X
dY := domain with id Y

|===================|===================|===================|
|         r0        |        r1         |        r2         |
|===================|===================|===================|
|  +----+   +----+  |  +----+           |  +----+   +----+  |
|  | d0 |   | d1 |  |  | d3 |           |  | d5 |   | d9 |  |
|  +----+   +----+  |  +----+           |  +----+   +----+  |
|===================|===================|===================|

Blueprint

rank_domain_distribution:
  ranks:
    domains: [0, 1, 3, 5, 9]
    sizes: [2, 1, 2]
    offsets: [0, 2, 3]
  owner_pattern: "r%05d"
  tree_pattern: "problem/domains"

Whether or not we want apply this schema depends a lot on how our clients represent this data. If they all use a format closer to our original concept, it may be worth ignoring this potential application in favor of the benefits of more zero-copy possibilities. That being said, I'm generally in favor of minimizing the number of abstractions/concepts whenever possible, so I'd advocate for weighing such possibilities carefully.

cyrush commented 3 years ago

What I am aiming for, adding the info about the mesh path into the state sub-tree of the blueprint index.

Non-sparse, say we have 5 domains, across 3 files, like the following:

|===================|===================|===================|
|         f0        |       f1          |        f2         |
|===================|===================|===================|
|  +----+   +----+  |  +----+           |  +----+   +----+  |
|  | d0 |   | d1 |  |  | d2 |           |  | d3 |   | d4 |  |
|  +----+   +----+  |  +----+           |  +----+   +----+  |
|===================|===================|===================|

Here is a potential index:

index

blueprint_index:
   myapp:
      state:
         cycle:  0
         time:  0.0
         number_of_domains:  5
         partition_size: 3
         partition_pattern: "file_%04d.hdf5"
         domain_pattern: "myapp/mesh/domain_%06d"
         domain_to_partition_map: [ 0, 0, 1, 2, 2 ]
      coordsets:
         coords:
            path:  "coordsets/coords"
      ... paths for topologies, fields would look similar
nselliott commented 3 years ago

@cyrush I like how that looks putting all of those things inside state rather than having data at the top level of blueprint_index. I think that will be easier to work with on the application side.

I think we might need one more integer substitution pattern string to identify the higher level tree that holds the domains. Do you have any intent to support trees from more than one rank in a single file, as can be done using SPIO? If so, then there needs to be a pattern that can be used to point to the "datagroup_**" object created at the highest level of the tree for each rank. If you aren't going to support that, then another pattern isn't needed, but partition_pattern would need to look something like:

partition_pattern: "file_%07d.hdf5:datagroup_%07d"

with the same integer value filling both substitution patterns. Can you support that syntax?

Alternatively, I can look at adding some more flexibility to SPIO so that the integer isn't needed in the datagroup name for the case of writing M ranks to M files. It would still be needed for M ranks to N files.

Overall, thank you for this example, as it clarifies for me the main structure of the new index.

cyrush commented 3 years ago

good point about the 3 level case for SPIO. Silo has nameschemes which are super flexible, but also a bit complex, I dont think I want to go that far, but it seems we need something that supports multi-level indexing.

cyrush commented 3 years ago

regrouping here:

using fmt we can support something like the following easily:

partition_pattern: "file_{file:04d}.hdf5/datagroup_{datagroup:05d}/myapp/mesh/domain_{domain:05d}"
partition_map: 
  file:  [ 0, 0, 2, 2 ]
  datagroup: [ 1, 0, 1, 0 ]
  domain: [ 0, 1, 2, 3 ]

users are free to create as complex of a pattern as they want -- it approaches the flexibility of silo's name schemes.

@nselliott + @xjrc let me know what you think about this

cyrush commented 3 years ago

our example from above:

|===================|===================|===================|
|         f0        |       f1          |        f2         |
|===================|===================|===================|
|  +----+   +----+  |  +----+           |  +----+   +----+  |
|  | d0 |   | d1 |  |  | d2 |           |  | d3 |   | d4 |  |
|  +----+   +----+  |  +----+           |  +----+   +----+  |
|===================|===================|===================|
partition_pattern: "file_{file:04d}.hdf5:myapp/mesh/domain_{domain:05d}"
partition_map: 
  file:  [ 0, 0, 1, 2, 2 ]
  domain: [ 0, 1, 2, 3, 4 ]
nselliott commented 3 years ago

@cyrush I like this suggestion. One question about the '/' after the file pattern--should it be a ':' to distinguish that the file contains the path but is not part of the path?

cyrush commented 3 years ago

@nselliott yep, that was a typo

cyrush commented 2 years ago

we have all the hard parts done - I think we need to add an example to our docs and then we can resolve this issue

nselliott commented 2 years ago

There is an example but it's not in the docs.

cyrush commented 1 year ago

The relay::io::blueprint::save_mesh and relay::io::blueprint::load_mesh, now support creating and reading the partition map style bp index. VisIt 3.3.2 will also have support to read this style.

Still need to add info to docs.

This test creates a good example:

https://github.com/LLNL/conduit/blob/96ce77238f8b1b759cb879d5390ce3cf2accfbb6/src/tests/blueprint/t_blueprint_mesh_relay.cpp#L680