TDycores-Project / TDycore

BSD 2-Clause "Simplified" License
4 stars 0 forks source link

[WIP] Created a unit test for verifying that nodes can see their subcells (FV approach) #180

Closed jeff-cohere closed 2 years ago

jeff-cohere commented 3 years ago

This PR (based on features in #178) introduces another unit test that constructs a mesh on a 3-torus, distributes it, creates ghost cells, and checks that its nodes each have 8 cells attached.

The test fails, perhaps because I assumed that in a domain-decomposed thrice-periodic mesh, every node "should" have exactly 8 cells attached, provided the overlap is expressed properly, and this assumption appears not to be true.

@knepley , would you mind taking a look at src/tests/test_parallel_node_subcells.c in this branch? I'd be interested to see how much of DMPlex I've managed to understand.

knepley commented 3 years ago

Okay, I can help talk through it. First, this mesh has no boundary, so the boundary label should be empty and no ghost cells will be created (they attach to the boundary). Second, TransitiveClosure() gives back points and orientations in the same array, so it is 2*num_points big. Maybe you are doing this. What kind of failure do you get?

jeff-cohere commented 3 years ago

Right, sorry about the boundary code. I added that to try experimenting with a non-periodic equivalent and then decided to ask for help instead, so I don't think I make any use of it.

The failure is in line 107, where I assert that a node has 8 points in its transitive closure that I've identified as cells (height 0). If you remove the comment in front of the line above that prints out the number of cells attached to the nodes, you get values including 1, 2, 3, 4, 5, 6, and 8. There are 3 parallel subdomains, just to make things nice and awkward at domain boundaries. :-)

jeff-cohere commented 3 years ago

I just reread this: "First, this mesh has no boundary, so the boundary label should be empty and no ghost cells will be created (they attach to the boundary)." Does this mean that DMPlexConstructGhostCells doesn't construct ghosts along domain boundaries for distributed meshes?

A couple more clarifying questions:

  1. Do I even need to call DMPlexConstructGhostCells in this case, given that I've called DMSetBasicAdjacency and DMPlexDistribute?
  2. Can I assume that the periodicity of a mesh is preserved by DMPlexDistribute? Does DMPlex support the parallel distribution of periodic meshes?
codecov-commenter commented 3 years ago

Codecov Report

Merging #180 (e483900) into master (7993463) will not change coverage. The diff coverage is n/a.

:exclamation: Current head e483900 differs from pull request most recent head 828bba6. Consider uploading reports for the commit 828bba6 to get more accurate results Impacted file tree graph

@@           Coverage Diff           @@
##           master     #180   +/-   ##
=======================================
  Coverage   67.73%   67.73%           
=======================================
  Files           7        7           
  Lines        1221     1221           
=======================================
  Hits          827      827           
  Misses        394      394           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7993463...828bba6. Read the comment docs.

knepley commented 3 years ago
  1. Now, ConstructGhostCells() works on parallel meshes, but only meshes with a boundary.

  2. I think we have a terminology problem. "Ghost" cells in ConstructGhostCells() are finite volume ghost cells that ring a domain and provide boundary conditions. They are not "parallel ghost" cells which is the terminology some places. We do say that for the most part. We say "shared", and root vs. leaf.

  3. Yes, and yes.

I will run a test like this myself just to check.

jeff-cohere commented 3 years ago

Well, I'm a bit stumped with src/tests/test_parallel_node_subcells.c. Given a distributed 3-torus, I believe every vertex should posess a transitive closure that contains exactly 8 cells, but I'm seeing cases in which a vertex has only 4 cells. I see this when running things locally with 3 processes (just to make things nice and awkward).

jeff-cohere commented 3 years ago

Hmm. Something fishy is going on with $MPIEXEC, too.

knepley commented 3 years ago

@jeff-cohere Okay, I think I might understand what is going on. You are checking the star (transitive support) of points in parallel. The guarantee that I make is that the closure (transitive one) of every point is complete in parallel, not the star, since you can only guarantee one direction.

I you distribute with overlap=1, you could check all the original vertices, but you would get new vertices without complete supports.

jeff-cohere commented 3 years ago

Thanks, @knepley. I kind of thought it might be the case that I was checking over more vertices than were originally there, but it wasn't clear to me how to traverse only the original local vertices for each process. Any advice on how I might do that?

knepley commented 3 years ago

Since it is just for testing, you could distribute with no overlap, mark all local vertices with a label, and then DMPlexDistributeOverlap() to get overlapping cells, and check only the labeled vertices.

jeff-cohere commented 3 years ago

Okay, I've tried to use the star forest data structure to traverse only those vertices that don't belong to other processes, but I'm still seeing nodes attached to only 4 cells.

I'm still fighting make, trying to get it to report error statuses from our unit tests. At this point I'm still not convinced that unit tests are going to work nicely in this project without more work on the build system and without carving out the data structures we're using. I'm not sure all that is quite worth the effort it would likely take--it kind of depends on TDycore's ultimate destiny.

jedbrown commented 3 years ago

Is there something in particular you need make to do? I did some refactoring quite a while ago, but some cruft remains due to desire to avoid breaking the workflow if cd into the subdirectory and make one example. That workflow is less reliable (or significantly more complicated) than just building everything from the top level, where all dependencies are clear.

jeff-cohere commented 3 years ago

Thanks, Jed. For this issue, I just need the make unit-tests command to correctly report error statuses from its find command (top-level gmakefile, line 164) so the CI system knows that unit test failures are real failures. I'm using find as the test harness, which is not great, but I think going any deeper would require a bit of discussion.

knepley commented 3 years ago

@jeff-cohere "Okay, I've tried to use the star forest data structure to traverse only those vertices that don't belong to other processes, but I'm still seeing nodes attached to only 4 cells." Yes, they will still be there (a vertex can be owned and be on the overlap boundary). For your simple test, you could stick to vertices which are not shared (no leaves or roots of degree > 0). To really check it, you would have to do the marking I suggested above.

jeff-cohere commented 3 years ago

Thanks, Matt, for the clarification. I thought I tried your suggestion, but wouldn't some of the original nodes be shared between processes in the same way as is occurring now?

I'll see if I can cull the shared nodes in this test.

jeff-cohere commented 3 years ago

Okay, I've fixed the test harness so it reports failures properly. All that remains is to fix the test to traverse only the original local vertices in the mesh and check their closures.

jeff-cohere commented 3 years ago

@knepley , just to be clear: are there any vertices that I should exclude from the original mesh with no overlap?

Specifically:

  1. I assemble all vertices just after doing a DMPlexDistribute with 0 overlap and stuff them into a set (local_vertices).
  2. Then I add 1 layer of overlapping cells, etc.
  3. Finally, I traverse the vertices of the overlapping distributed mesh, checking the attached cells only if they belong to the set of original vertices.

In step 3, I still encounter some vertices that belong to that original set that can only "see" 4 cells. I believe my "set" serves the same purpose as the label of local vertices that you were recommending.

Thanks again for your patience and help on this.