Tehforsch / diman

Define rust compile time unit systems using const generics
52 stars 2 forks source link

Heterogeneously dimensioned linear algebra #39

Open Tehforsch opened 9 months ago

Tehforsch commented 9 months ago

Extracted from the discussion in #36:

@jedbrown

Heterogeneously-dimensioned vectors (and more general linear algebra). A sufficient example is viewing tests/gas structures as vectors, supposing that only one direction is implemented (typical for equations of state for real gasses), then using Newton's method to numerically invert the transformation. Doing that naively, you have 5-vectors with heterogeneous dimension and a 5x5 matrix with matched (also heterogeneous) dimension. To use third-party linear algebra, we'd need to shed the dimensional layer, but how safe could we make it?

@Tehforsch

This is something I would like a lot. The way I currently see it, we'd implement this by disregarding the quantity type (which is homogeneous in dimension for the entire storage type) and providing a custom type that is something like (pseudo-code-ish)

struct Heterogeneous<const N: usize, const DS: [Dimension; N]>(NDarray<N>);

Of course it doesn't stop there, because we'd also need matrices (and probably even higher dimensional tensors?).

My very first experiment with this wasn't very promising:

error[E0770]: the type of const parameters must not depend on other generic parameters
  --> src/main.rs:15:59
   |
15 | struct Heterogeneous<const N: usize, const D: [Dimension; N]>([f64; N]);
   |                                                           ^ the type must not depend on the parameter `N`
   |
   = note: const parameters may not be used in the type of const parameters

So at least with the current implementation of const_adt_params, this isn't possible. Another possibility would be to "manually" unwrap the N-layer of the implementation and implementing it for a number of quantities: Quantity1<const D: [Dimension; 1]>,Quantity2<const D: [Dimension; 2]>,Quantity3<const D: [Dimension; 3]> etc., up to a reasonable number of dimensions. Of course this is a much uglier implementation, but with the right kind of interface around it, it could still be usable. I definitely don't expect very pretty error messages in this system. I am also assuming here that very high N don't appear in these kinds of systems, but I am really only familiar with the typical state vectors of (density, velocity, energy) or something alike, so perhaps there are some other disciplines where large numbers of N are required.

This could be possible, but would require quite a bit of macro code. It's something that could be tested quite quickly.

Interfacing with the linear algebra library is the next problem of course. I'd probably start by picking a single library and then trying to implement some of the basic methods for all of the QuantityX types above. I don't see anything that is stopping this right now, but it would have to come down to a test.

As a more general comment that seems like it might be relevant here: One thing that bothers me is that everytime more information is added to the Dimension struct in order to perform stricter analysis, the amount of information included in the compiler error messages increases, making it harder to see the dimension mismatch on a first glance. One thing that would be super helpful here is if I could somehow inject custom formatting code for my const generic structs that tells the compiler how to display them in error messages. Unfortunately I am not aware of this being an option. I tried looking into how/where the compiler even displays these structs (because it does this for structs that don't have a Debug implementation), but wasnt even remotely familiar enough with the rustc code to understand this.

@jedbrown

I'm fearful of the explosion this would cause, and wonder if we could do something based on structs. Like if tests/gas made Primitive and Conservative indexable, then we could have conversion routines to Vector with impl AsVector for Primitive defining the length and conversion between the struct and array representations. The intent here is that adding a Vector to a Vector would name those types in its error message, rather than extreme verbosity of each entry. This is a rough sketch and may have serious issues -- it feels like a messy problem. In my applications, the small matrices/vectors are assembled into large ones (dimension in the millions), with some rows/columns eliminated for boundary conditions so there isn't a trivial pattern identifying dimensions for any given entry (and in any case, the size is also not known at compile time). As for libraries suitable for the large sizes, faer is by far the most interesting. I think it's not super relevant here because typing likely only applies to statically-dimensioned matrices. nalgebra is probably the most popular for statically sized matrices.

Tehforsch commented 9 months ago

I'm fearful of the explosion this would cause, and wonder if we could do something based on structs. Like if tests/gas made Primitive and Conservative indexable, then we could have conversion routines to Vector with impl AsVector for Primitive defining the length and conversion between the struct and array representations. The intent here is that adding a Vector to a Vector would name those types in its error message, rather than extreme verbosity of each entry. This is a rough sketch and may have serious issues -- it feels like a messy problem.

That is an interesting idea. If I understand you correctly, we'd have some sort of derive macro for structs with heterogeneous dimensions that automatically implements AsVector where the return value isn't just a vector but some "safe" wrapper around a vector that tracks where that vector came from (i.e. whether it came from Primitive or Conservative). The trick would be that we simplify the problem by not being express any heterogeneous vector, but only specific ones the user needs. This should allow us to enumerate them at compile time and track that operations between them are safe.

All of that should be possible to autogenerate, but then we'd still need support for transformations between the two representations, i.e. heterogeneous matrices that have some kind of AsMatrix implementation that turns them into something that can be passed on to a linear algebra library.

In my applications, the small matrices/vectors are assembled into large ones (dimension in the millions), with some rows/columns eliminated for boundary conditions so there isn't a trivial pattern identifying dimensions for any given entry (and in any case, the size is also not known at compile time). As for libraries suitable for the large sizes, faer is by far the most interesting. I think it's not super relevant here because typing likely only applies to statically-dimensioned matrices. nalgebra is probably the most popular for statically sized matrices.

One thought on this: It might be possible to do some dimensional analysis at run-time too. For example if we had some kind of mutable matrix, we could define

struct DimensionedMatrix {
    m: Matrix,
    row_dimensions: Vec<Dimension>,
    column_dimensions: Vec<Dimension>,
}

where Matrix is some dynamically-sized type from a library of our choice. We could then fill the matrix via something like

fn add_entry<const D: Dimension>(matrix: &mut DimensionedMatrix, i: usize, j: usize, q:  Quantity<f64, D>) {
    matrix[i][j] = q.value_unchecked();
    match matrix.row_dimension(i) {
        Some(dimension) => assert_eq!(dimension, D);
        None => matrix.store_row_dimension(i, D);
    } 
    match matrix.column_dimension(j) {
        Some(dimension) => assert_eq!(dimension, D);
        None => matrix.store_column_dimension(j, D);
    }
}

of course this comes at performance and memory cost, so it might be something you only want to do in test runs to verify the code and then have it disabled for actual runs. Then again, the runtime cost would be O(n m) with O(n+m) memory, so maybe it might even be tolerable in some applications? Of course I am leaving many implementation details to the imagination here, but I am mainly wondering if this is something that would be interesting in applications?