PETSc interface - Githubissues

dajuno commented 9 months ago

Parallel implementation of the PETSc interface.

TODO: eigenvalue solver

Sbte commented 9 months ago

I created #28 which fixes the HYMLS tests on multiple cores, in case you're interested what was still failing on that side. After that PR nothing fails anymore.

Sbte commented 9 months ago

I also created #29 to show how I want to integrate the PETSc tests into the continuation tests. That doesn't have to be done in this PR though.

Sbte commented 9 months ago

It looks like things go very wrong when running the tests in parallel (everything becoming NaN). That should be fixed.

Sbte commented 9 months ago

I debugged it a bit and it seems like ghostUpdate is doing the wrong thing.

Sbte commented 9 months ago

I think the LGMap is ignored, and that's what is causing the issue.

dajuno commented 9 months ago

where do you see the NaNs? what test are you looking at specifically?

Sbte commented 9 months ago

What I tried now was to run the first ldc test on 4 cores. Then I printed the state vector and the indices (real and ghost) and compared. The state just has increasing numbers in it, so the indices should match the values (-1). They don't, which is what is causing issues. I'm quite sure that the reason is that the LGMap is completely ignored, at least for the ghost communication.

The NaN values are from the later test, but I think it's good to focus on test_ldc first.

dajuno commented 9 months ago

Hm interesting. Curious that it works with 2 procs

Sbte commented 9 months ago

It's probably because it gets cut in half in such a way that all indices are continuous. That doesn't happen if you divide it into four quarters.

dajuno commented 9 months ago

Hm, not sure how to reproduce.

What I tried now was to run the first ldc test on 4 cores. Then I printed the state vector and the indices (real and ghost) and compared. The state just has increasing numbers in it, so the indices should match the values (-1). They don't, which is what is causing issues. I'm quite sure that the reason is that the LGMap is completely ignored, at least for the ghost communication.

This seems to work for me:

    for iloc, iglob in enumerate(interface.map.indices):
        with state.localForm() as lf:
            print(f"#{interface.comm.rank}", iglob, lf[iloc])
            assert iglob == lf[iloc] - 1

Actually, before this, I got an error with A.getRow(i) function in extract_sorted_row, which fails on procs that don't own i. Had to wrap the function body in if i in range(*A.getOwnershipRange()): ... else: return [], [].

dajuno commented 9 months ago

If ghosts, then maybe a mistake in the slightly awkward index sorting and rearranging for discretization . In PETSc, the ghosts are appended to the vector after the real values, which makes this transformation necessary.

--

I see test_ldc failing on 4 procs when comparing the following entries of the jacobian (values_A[j] == values_B[j]):

FAILED tests/test_PETSc.py::test_ldc - AssertionError: rank 1: i = 96, j = 3, 1.6275 == 2.6275
FAILED tests/test_PETSc.py::test_ldc - AssertionError: rank 2: i = 128, j = 0, 1.15875 == 2.15875
FAILED tests/test_PETSc.py::test_ldc - AssertionError: rank 3: i = 224, j = 3, 5.6275 == 6.6275
FAILED tests/test_PETSc.py::test_ldc - AssertionError: rank 0: i = 16, j = 4, -1.28125 == -0.78125

probably not a coincidence how these are all multiples of 16..

Sbte commented 9 months ago

Right, but this is because the state vector that gets used to create the jacobian is wrong in the first place. You can print the state from the from_array method using lf.array after calling ghostUpdate, and you can see that there's duplicate values in there, which should be impossible.

dajuno commented 9 months ago

I see, you're right. I'll have a look

dajuno commented 9 months ago

I think the reason is this: https://petsc.org/main/manual/mat/#matrix-and-vector-layouts-and-storage-locations

PETSc vectors always have a contiguous range of vector entries stored on each MPI rank. The first rank has entries from 0 to rend1 - 1, the next rank has entries from rend1 to rend2 - 1, etc.

Sbte commented 9 months ago

Well, then I guess we have to make our own "map" that maps between domain indices and petsc indices. Initially, you can just make an array that contains all global indices obtained by looping over the domains, and later this can be optimised so it only contains local and ghost nodes.

dajuno commented 9 months ago

PETSc has a tool for this. I'll check it out.

dajuno commented 9 months ago

A future problem to think about: when using PETSc with complex numbers, all PETSc vectors and matrices will be complex. I think this means that the numpy arrays created by Discretization must also be complex.

Sbte commented 9 months ago

I imagine we'll just have to handle the conversion in the PETSc interface.

dajuno commented 5 months ago

All implemented tests now succeed on multiple cores (tested with 1, 2, 4). The tests include some trivial checks which helped me with the implementation, so I didn't delete them.

Sbte commented 4 months ago

Can you rebase on top of current master? I think that will fix the tests.

dajuno commented 4 months ago

Can you rebase on top of current master? I think that will fix the tests.

done, but still not passing

Sbte commented 4 months ago

done, but still not passing

Some were fixed, but not all. Seems like I have to push the same thing to JaDaPy.

dajuno commented 4 months ago

btw, it wouldn't be difficult to create an ubuntu docker image with petsc installed, if we want to test petsc in the CI

Sbte commented 4 months ago

btw, it wouldn't be difficult to create an ubuntu docker image with petsc installed, if we want to test petsc in the CI

Yeah, I was gonna do that next, but feel free to give it a go. I didn't do it for Trilinos because it's basically impossible.

dajuno commented 4 months ago

btw, it wouldn't be difficult to create an ubuntu docker image with petsc installed, if we want to test petsc in the CI

Yeah, I was gonna do that next, but feel free to give it a go. I didn't do it for Trilinos because it's basically impossible.

I made a first basic version here: https://github.com/dajuno/petsc-docker All tests pass in the container, PETSc tests also in parallel

BIMAU / transiflow

PETSc interface #27