This PR overhauls the datadeps system in a few ways:
Datadeps now understands the aliasing semantics of view, UpperTriangular/LowerTriangular, Diagonal, and more, and allows for finer-grained parallelism when utilized
Datadeps now (mostly) works with GPUs (CUDA in particular), although further scheduler work and general testing is required
Unnecessary allocations (copy buffers) only occur when required, reducing memory usage considerably
Adds three more static schedulers as options, to be further developed
This PR overhauls the datadeps system in a few ways:
UpperTriangular/LowerTriangular
,Diagonal
, and more, and allows for finer-grained parallelism when utilized