Specification of blocking and interleaving structure

CliMA / Solvent.jl

A CPU- and GPU-friendly package for linear solvers

Apache License 2.0

6 stars 1 forks source link

Specification of blocking and interleaving structure #1

Open simonbyrne opened 4 years ago

simonbyrne commented 4 years ago

One of the challenges is separating out how to describe the structure of our operators that we can exploit, and so lots of the solvers reach into the dg.grid to find out this stuff. It would be nice if we had a systematic way to explain this, and I think something like this should work:

abstract type OperatorStructure
end
struct Dense <: OperatorStructure
   dim::Int
end
struct Banded <: OperatorStructure
   dim::Int
   bandwidth::Int
end
struct Blocked{S} <: OperatorStructure
   sub::S
   nblocks::Int
end
struct Interleaved{S} <: OperatorStructure
   sub::S
   nblocks::Int
end

Then given our standard (nQ*nQ*nQ, nS, nV*nE) layout, we could describe the structure of the DG operator as

structure(dg::DGModel) = 
    Blocked(Interleaved(nQ*nQ, Banded(nQ*nS*nV, nQ*nS*eband-1)), nH)

Then the batched solvers could be defined directly in terms of the structure.

simonbyrne commented 4 years ago

@rohanmclure was able to get the outer Blocked(...) mostly working here: https://github.com/CliMA/ClimateMachine.jl/pull/662

jkozdon commented 4 years ago

Then given our standard (nQ*nQ*nQ, nS, nV*nE) layout, we could describe the structure of the DG operator as

If possible, might be nice to be able to have enough flexibility to switch to other formats too. Ideally this solvers wouldn't need to know about changes to data layout, padding, etc., and this can be handled by something like PermutedDimsArray and StructArrays.

Some other storage we have used in the past (which might be better for vectorization later)

(ns, nQ*nQ*nQ, nV*nE) << state first
(X, nQ*nQ*nQ, ns/X, nV*nE) << storing X values together as a vector of width X

Sure there are others too.

By padding I mean: for optimal performance we may want to explore if we should pad the first fastest row to allow for aligned access. Not sure how much this is needed on modern hardware, but we used to do this. (Not sure how this effect the solvers exactly.)

simonbyrne commented 4 years ago

Thanks, those are good points. I think we should be able to support those ideas within this schema, and we could add support for padding/alignment as well.

I've thought about StructArrays: the problem there is that it doesn't guarantee to store the backing arrays in contiguous memory. PermutedDimsArray we might be able to express by figuring out the roles of the permutations to the schema.