LLNL / conduit

Simplified Data Exchange for HPC Simulations
https://software.llnl.gov/conduit/
Other
213 stars 65 forks source link

Multi-domain mesh creation #1332

Open tmarrinan opened 4 weeks ago

tmarrinan commented 4 weeks ago

Hello. I have data that is distributed amongst N processes and I want to create a blueprint mesh for it. I thought I was doing it correctly, but when I call the partition function, I am getting unexpected results. I'm not sure if I am creating the mesh wrong or calling the partition function wrong. Any assistance would be appreciated!

Example (12 processes, each owning a 4x4 subregion of an overall 16x12 grid):

+-----+-----+-----+-----+
|  0  |  1  |  2  |  3  |
|     |     |     |     |
+-----+-----+-----+-----+
|  4  |  5  |  6  |  7  |
|     |     |     |     |
+-----+-----+-----+-----+
|  8  |  9  | 10  | 11  |
|     |     |     |     |
+-----+-----+-----+-----+

Code:

int rows = 3;
int columns = 4;
int local_width = 4;
int local_height = 4;
int row = rank / columns;
int column = rank % columns;
int origin_x = local_width * column;
int origin_y = local_height * row;

double values[16] = {...};

conduit::Node mesh;
mesh["state/domain_id"] = process_id;

mesh["coordsets/coords/type"] = "uniform";
mesh["coordsets/coords/dims/i"] = local_width;
mesh["coordsets/coords/dims/j"] = local_height;

mesh["coordsets/coords/origin/x"] = origin_x;
mesh["coordsets/coords/origin/y"] = origin_y;
mesh["coordsets/coords/spacing/dx"] = 1;
mesh["coordsets/coords/spacing/dy"] = 1;

mesh["topologies/topo/type"] = "uniform";
mesh["topologies/topo/coordset"] = "coords";

mesh["fields/scalar1/association"] = "vertex";
mesh["fields/scalar1/topology"] = "topo";
mesh["fields/scalar1/values"].set(values, 16);

I then want to repartition the mesh to access the whole thing on process 0. So I tried the following:

conduit::Node options, selections, output;
conduit::Node &selection = selections.append();
selection["type"] = "logical";
selection["start"] = {0u, 0u, 0u}; // for some reason this failed if I only used 2 dimensions
selection["end"] = {16u, 12u, 1u}; // for some reason this failed if I only used 2 dimensions
options["target"] = 1;
options["selections"] = selections;

conduit::blueprint::mpi::mesh::partition(mesh, options, output, MPI_COMM_WORLD);

However, the resulting output mesh still only has size 4x4 and only contains the data from process 0.

As a side note, I am setting "target" to 1 (specifying 1 process), but how do I specify which process (i.e. what if I want it on process 3 instead of process 1)?

tmarrinan commented 4 weeks ago

OK - after a bit more reading and testing - I think I have it working!

There were 2 key things I needed to change (one with the mesh and one with the partition options):

  1. Change scalar value association from "vertex" to "element" (this also meant that the coordinate dims needed to be increased by 1)
  2. Have an array of selections (one per process), adding a proper "domain_id" and using start and end values that match the local data

Final solution:

conduit::Node mesh;
mesh["state/domain_id"] = process_id;

mesh["coordsets/coords/type"] = "uniform";
mesh["coordsets/coords/dims/i"] = local_width + 1;
mesh["coordsets/coords/dims/j"] = local_height + 1;

mesh["coordsets/coords/origin/x"] = origin_x;
mesh["coordsets/coords/origin/y"] = origin_y;
mesh["coordsets/coords/spacing/dx"] = 1;
mesh["coordsets/coords/spacing/dy"] = 1;

mesh["topologies/topo/type"] = "uniform";
mesh["topologies/topo/coordset"] = "coords";

mesh["fields/scalar1/association"] = "element";
mesh["fields/scalar1/topology"] = "topo";
mesh["fields/scalar1/values"].set(values, 16);

int i;
conduit::Node options, selections, output;
for (i = 0; i < num_processes; i++)
{
    conduit::Node &selection = selections.append();
    selection["type"] = "logical";
    selection["domain_id"] = i;
    selection["start"] = {0u, 0u, 0u};
    selection["end"] = {local_width, local_height, 1u};
}
options["target"] = 1;
options["fields"] = {"scalar1"};
options["selections"] = selections;
options["mapping"] = 0;

conduit::blueprint::mpi::mesh::partition(mesh, options, output, MPI_COMM_WORLD);

This resulted in the following output (each process filled its local data with floating point values equal to its process id):

state:
  domain_id: 0
coordsets:
  coords:
    type: "uniform"
    origin:
      x: 0.0
      y: 0.0
    dims:
      i: 17
      j: 13
topologies:
  topo:
    type: "uniform"
    coordset: "coords"
fields:
  scalar1:
    topology: "topo"
    association: "element"
    values: [0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 9.0, 9.0, 9.0, 9.0, 10.0, 10.0, 10.0, 10.0, 11.0, 11.0, 11.0, 11.0, 8.0, 8.0, 8.0, 8.0, 9.0, 9.0, 9.0, 9.0, 10.0, 10.0, 10.0, 10.0, 11.0, 11.0, 11.0, 11.0, 8.0, 8.0, 8.0, 8.0, 9.0, 9.0, 9.0, 9.0, 10.0, 10.0, 10.0, 10.0, 11.0, 11.0, 11.0, 11.0, 8.0, 8.0, 8.0, 8.0, 9.0, 9.0, 9.0, 9.0, 10.0, 10.0, 10.0, 10.0, 11.0, 11.0, 11.0, 11.0]
  original_element_ids:
    topology: "topo"
    association: "element"
    values:
      domains: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
      ids: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 8, 9, 10, 11, 8, 9, 10, 11, 8, 9, 10, 11, 8, 9, 10, 11, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 8, 9, 10, 11, 8, 9, 10, 11, 8, 9, 10, 11, 8, 9, 10, 11, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 8, 9, 10, 11, 8, 9, 10, 11, 8, 9, 10, 11, 8, 9, 10, 11, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15]
  original_vertex_ids:
    topology: "topo"
    association: "vertex"
    values:
      domains: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
      ids: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 9, 10, 11, 12, 13, 10, 11, 12, 13, 10, 11, 12, 13, 10, 11, 12, 13, 14, 15, 16, 17, 18, 15, 16, 17, 18, 15, 16, 17, 18, 15, 16, 17, 18, 19, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 9, 10, 11, 12, 13, 10, 11, 12, 13, 10, 11, 12, 13, 10, 11, 12, 13, 14, 15, 16, 17, 18, 15, 16, 17, 18, 15, 16, 17, 18, 15, 16, 17, 18, 19, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 9, 10, 11, 12, 13, 10, 11, 12, 13, 10, 11, 12, 13, 10, 11, 12, 13, 14, 15, 16, 17, 18, 15, 16, 17, 18, 15, 16, 17, 18, 15, 16, 17, 18, 19, 20, 21, 22, 23, 20, 21, 22, 23, 20, 21, 22, 23, 20, 21, 22, 23, 24]
JustinPrivitera commented 4 weeks ago

I'm glad you got this working. Do you have suggestions on how we can improve the documentation?

tmarrinan commented 3 weeks ago

The tricky part was realizing that which data came from which domain_id needed to be manually selected using an array of "selections" rather than just specifying the desired region and letting Conduit determine who owned that data.

There were no examples in the documentation with multiple selections, so it was a bit of trial-and-error. Having the code that matches the M:N redistribution in the picture (where target is 10, 4, and 2) might be helpful.

tmarrinan commented 3 weeks ago

Well, now I'm running into a different issue. If my data contains "ghost cells" (border cells that contain data from a neighbor), then I am receiving the following warning when repartitioning: Unable to combine domains as uniform, using unstructured.

In the example above:

+-----+-----+-----+-----+
|  0  |  1  |  2  |  3  |
|     |     |     |     |
+-----+-----+-----+-----+
|  4  |  5  |  6  |  7  |
|     |     |     |     |
+-----+-----+-----+-----+
|  8  |  9  | 10  | 11  |
|     |     |     |     |
+-----+-----+-----+-----+

I now have each process with ghost cells for its neighbors. This means the actual data size for each process is as follows (when overall grid is 16x12):

Accordingly, I update my "start" and "end" in each selection to account for the desired data sometimes being 1 cell to the right or down. I also update the "origin/{i,j}" of the coordset in the mesh.

Any ideas why the uniform domain cannot be maintained?

tmarrinan commented 3 weeks ago

Wait, nevermind... I just realized that "end" is inclusive. It didn't matter without the ghost cells, since it would get cropped to the data size, but I was now grabbing the ghost cells to the right / below since I assumed "end" was exclusive

JustinPrivitera commented 3 weeks ago

Conduit Blueprint currently has no notion of ghost cells or nodes, but that support will likely be added in the future.

JustinPrivitera commented 3 weeks ago

We should enhance the documentation for partitioning and provide more and better examples.

tmarrinan commented 2 weeks ago

Hello! I have one more question relating to partitioning. I am now attempting to accomplish the same thing, but using Python instead of C++. I don't see many examples, but when I try output = conduit.blueprint.mpi.mesh.partition(mesh, options, comm), I get an error about conduit blueprint not having a member named "mpi". How could I achieve the same thing in Python?

JustinPrivitera commented 2 weeks ago

@cyrush can correct me if I'm wrong, but I don't believe MPI is enabled for the python interface for blueprint. I'm not sure why that's the case. We should add it.

cyrush commented 1 week ago

read of the situation is correct, we can add that support.

JustinPrivitera commented 1 week ago

1333