GEOS-DEV / GEOS

GEOS Simulation Framework
GNU Lesser General Public License v2.1
212 stars 85 forks source link

Modify LAI implementation to use newer GPU friendly matrix/vector objects and solvers. #518

Closed rrsettgast closed 2 years ago

rrsettgast commented 5 years ago

Is your feature request related to a problem? Please describe. We currently use the old cpu only interface to trilinos. Instead of Epetra we should be using Tpetra?? And I assume that there is a new set of linear solvers and preconditioners as well.

Describe the solution you'd like A perfect transition between the legacy capabilities and the new platform-portable capabilities.

Describe alternatives you've considered Making a time machine and going back to 2005 so we don't have to worry about GPU's.

Additional context I don't want to do this. Many beers will be owed to the brave soul who takes this on.

klevzoff commented 5 years ago

I'm struggling to find good open-source examples of on-GPU assembly with Tpetra. The closest I've seen is https://trilinos.org/docs/dev/packages/tpetra/doc/html/Tpetra_Lesson07.html where they essentially build out CSR structures by hand (rowptr, colind, vals, stored as Kokkos views), and use them to initialize a Tpetra::CrsMatrix. The SumIntoGlobalValues interface we are currently using does not seem device-compatible (even with a pre-defined static graph). It'd be nice to see what other LA packages' GPU interfaces look like - in case we might have to change our approach to assembly and our LAI classes as a result.

rrsettgast commented 5 years ago

@hmmorgan @rtmills

Do you have any examples of on-device construction of a matrix for use in petsc?

klevzoff commented 4 years ago

I did a quick review of on-device assembly in Trilinos, hypre and PETSc. The conclusion is that none of the three packages actually expose __device__ functions for adding/setting individual values/blocks of values. Instead, the common approach seems to be to let the user build arrays of matrix contributions in the target memory space and just consume the device pointers (in a host-callable function or constructor).

For example:

Based on this, I think our approach should be assembling into our own data structure (which may be LvArray::CSRMatrix or something similar) and then pass it into a LAI package. This approach has some nice consequences:

Thoughts?

corbett5 commented 4 years ago

That sounds pretty reasonable to me, though honestly I'd just be psyched if the LvArray::CRSMatrix is of use.

So could we do that with our current host approach? Assemble into a CRSMatrix and then pass the pointers off to the LAI? If so that seems a good place to start.

rrsettgast commented 4 years ago

@klevzoff Thanks! Perhaps we are just going to be expected to generate our own CRS structure. So can we develop a task list for each package? Here is a first cut.

Trilinos: We will have to see if we are able to use our array classes, or if we have to use Kokkos::View. I imagine we can just make a shallow Kokkos::View to our data.

  1. Components of Rhs construction:

    1. Allocate rhs locally using number of local rows. We would just allocate an array1d and capture it in the kernel launch to fill.
      Kokkos::View<double*> forcingTerm ("forcing term", numLclNodes);
    2. fill forcingTerm.
    3. create Tpetra::Vector...but this is after creating the Tpetra::CrsMatrix
      // Hack to deal with the fact that Tpetra::Vector needs a
      // DualView<double**> for now, rather than a View<double*>.
      Kokkos::DualView<double**, Kokkos::LayoutLeft> b_lcl ("b", numLclRows, 1);
      b_lcl.modify<Kokkos::DualView<double**, Kokkos::LayoutLeft>::t_dev::execution_space> ();
      Kokkos::deep_copy (Kokkos::subview (b_lcl.d_view, Kokkos::ALL (), 0), forcingTerm);
      Tpetra::Vector<> b (A.getRangeMap (), b_lcl);

      I am a little concerned by the deep copy, but I can't tell if this would be required.

  2. Components of Jacobian construction:

    1. Get numOfLocalRows
    2. Get rowCounts, rowOffsets, numLocalEntries, colIndices. I think we can do this at the same time by simply using our native LvArray::CRSMatrix with our existing approach and just extracting the pointers from LvArray::CRSMatrix. The example is kind of all over the place and I don't know if it needs to be.
    3. reduction to get the numGlobalEntries.
    4. Fill the matrix.
    5. Create matrixValue? Again...hope we don't have to use a Kokkos::View for this and we can just pass the data pointer from LvArray::CRSMatrix
    6. Create rowMap and colMap
      
      RCP<const Tpetra::Map<> > rowMap =
      rcp (new Tpetra::Map<> (numGblElements, numLclElements, indexBase, comm));

    // Flag to tell Tpetra::Map to compute the global number of // indices in the Map. const Tpetra::global_size_t INV = Teuchos::OrdinalTraits::invalid (); RCP<const Tpetra::Map<> > colMap = rcp (new Tpetra::Map<> (INV, colInds.data (), colInds.extent (0), indexBase, comm));

    7. Create the `Tpetra::CrsMatrix`
    ```c++
    Tpetra::CrsMatrix<double> A (rowMap, colMap, rowOffsets,
               colIndices, matrixValues);

Does this sound reasonable?

klevzoff commented 4 years ago

FYI https://github.com/trilinos/Trilinos/issues/7695