CSR/CSC: Allow working with borrowed data + generalize index type

This issue describes two in principle disjoint issues, but a design that address either needs to take care to accommodate both. They are:

Currently CSC/CSR (and COO) matrices only work with owned data. However, in many cases a user might have access to compatible, borrowed CSC/CSR data. In this case, it should be possible to work on the borrowed data with our data structures.
Currently all indices in CSR/CSC matrices are stored as usize. For the vast majority of applications, 32-bit indices suffice, which can be expected to have a significant impact on performance due to reduced memory bandwidth requirements. We should therefore generalize the index type.

There are nuances to both these issues, and in the end they must remain compatible. Our primary goal for the design of both these features, however, should be to try to reduce the added complexity of a larger and increasingly generic API surface. The library should remain easy to use, the documentation should be easy to browse and the API easy to use.

To this end, one possibility would be to add another Storage generic parameter to CsrMatrix/CscMatrix and make it default to owned storage, e.g.:

struct CsrMatrix<T, Index=usize, Storage=OwnedCsStorage<T, Index>> { ... }

The existing CsMatrix struct, which is currently only an implementation detail used to reduce repetition between the CSR and CSC implementations, might be repurposed to fit the role of a OwnedCsStorage.

There's similarly a host of issues related to storing the indices. For example, we would like to be able to use signed integers as indices, because other software might use signed integers for this purpose. However, we need to make sure that we can soundly convert back and forth between usize and the index type on demand. That is, we need to ensure that all indices stored in a valid CSR/CSC matrix are actually convertible to usize without overflow/underflow.

I'm fairly interested in this functionality. At present there's no way to slice sparse matrix types, and having borrowed storage makes sense for such functionality to exist.

I'm not sure that a generalized index type is needed for that (it seems like a distinct issue).

Do you have recommendations for where to start if I wanted to add a custom storage generic parameter and start implementing the ability to slice csc / csr matrices?

As for generalizing across index types, is the Dim trait insufficient? I feel like this is pretty much everything one would need on that front. It doesn't guarantee conversion to usize, but it does at least let you do try_to_usize.

I'm fairly interested in this functionality. At present there's no way to slice sparse matrix types, and having borrowed storage makes sense for such functionality to exist.

To clarify: by slicing, do you mean - for example - to take a "view" only to a select set of rows in a CSR matrix, or a select set of colums in a CSC matrix? Or do you mean something even more general?

If we want to support this we also need to relax/change our assumptions on CSR/CSC data, so we'll have to think about if this imposes any problems first.

I'm not sure that a generalized index type is needed for that (it seems like a distinct issue).

It is a distinct issue, but the connection is that we want an overall design that encapsulates both borrowed storage and generalized index types with minimal API complexity. So it would be good to at least consider how possible designs for these features impact each other in terms of the API complexity. Currently, the API of CSR/CSC matrices has been deliberately kept very simple so that it's accessible and understandable in the generated rustdoc. Ideally we'd like to obtain the benefits of these new designs with minimal changes to the existing API.

Do you have recommendations for where to start if I wanted to add a custom storage generic parameter and start implementing the ability to slice csc / csr matrices?

I think we'll first have to clarify precisely the expectations of "slicing", then consider how this might impact the sparse matrix formats for "non-slicing" workflows, if at all. Once this is resolved it would perhaps make sense to explore creating a CsStorageTrait or something to that effect. However, we'd also need to construct SparsityPattern with borrowed data, so it seems to me there must be a trait also for patterns.

As for generalizing across index types, is the Dim trait insufficient? I feel like this is pretty much everything one would need on that front. It doesn't guarantee conversion to usize, but it does at least let you do try_to_usize.

I think Dim serves a different purpose, namely to abstract over dimensions specified either at runtime or compile-time. It's not meant to represent indices, and I don't think we should try to repurpose it. Dim is not implemented for any signed types, for one - and it also cannot be implemented for signed types.

To clarify: by slicing, do you mean - for example - to take a "view" only to a select set of rows in a CSR matrix, or a select set of colums in a CSC matrix? Or do you mean something even more general?

Quite literally, I want to be able to effectively call this function from dense matrices on sparse matrices.

I think we'll first have to clarify precisely the expectations of "slicing", then consider how this might impact the sparse matrix formats for "non-slicing" workflows, if at all. Once this is resolved it would perhaps make sense to explore creating a CsStorageTrait or something to that effect.

I think at a top level how CsMatrix is handled should not change. As far as getting lanes, entries, etc. I think that API doesn't really need to care about what the underlying storage is as long as it can be used to implement those interfaces. So yeah, a CsStorage trait would be necessary, and probably look something like:

pub(crate) trait CsStorage<T> {
    fn pattern(&self) -> &SparsityPattern;

    fn values(&self) -> &[T];

    fn get_entry(&self, major_index: usize, minor_index: usize) -> Option<SparseEntry<'_, T>>;

    fn get_lane(&self, index: usize) -> Option<CsLane<'_, T>>;

    fn lane_iter(&self) -> CsLaneIter<'_, T>;
}

Of course, there'd have to be a corresponding CsStorageMut to live alongside this. CsMatrix would then expose these methods as:

impl<T, S> CsMatrix<T, S> 
where
    S: CsStorage<T>,
{
    fn pattern(&self) -> &SparsityPattern {
        self.storage.pattern()
    }

    // ...
}

impl<T, S> CsMatrix<T, S> 
where
    S: CsStorageMut<T>,
{
    fn get_entry_mut(
        &mut self,
        major_index: usize,
        minor_index: usize,
    ) -> Option<SparseEntryMut<'_, T>>
    {
        // etc
    }

    // ...
}

Simply echoing the underlying (crate-private) trait. This way the main API doesn't need to change, but immutable and mutable slice types on the storage can exist separately.

However, we'd also need to construct SparsityPattern with borrowed data, so it seems to me there must be a trait also for patterns.

I'm not so sure that SparsityPattern requires "borrowed" data, since in the simplest possible case, you can just hold onto &'a SparsityPattern. Of course, the internal representation of the slice itself would need to track which indices fall within the slice, but that's what e.g. the primitive slice type does already, so it isn't unidiomatic.

I think Dim serves a different purpose, namely to abstract over dimensions specified either at runtime or compile-time. It's not meant to represent indices, and I don't think we should try to repurpose it. Dim is not implemented for any signed types, for one - and it also cannot be implemented for signed types.

Well, yes and no. You have to index according to the same type as the upper size of your dimension type. The abstraction between runtime / compile time size is something that's unnecessary here though. In any case, if you wanted to extend what I had above, I think it would be possible, but not actually necessary. As far as extending it, you'd change SparsityPattern to SparsityPattern<I: Into<usize>>. But to hone in on one point:

Dim is not implemented for any signed types, for one - and it also cannot be implemented for signed types.

Indices and sizes should not be signed. So while I guess I'd agree that Dim is a bit different from what's desired here, there is no benefit to supporting signed types, and the nalgebra crate has made this explicit. It wouldn't, after all, make sense to add a value at index (-1, -1) in a matrix.

I'm going to continue down the above path and submit a PR if I can accomplish slicing sparse matrices. Further discussion can continue in the meanwhile, but it'll be easier to demonstrate with actual code I reckon.

Quite literally, I want to be able to effectively call this function from dense matrices on sparse matrices.

My primary motivation for working with borrowed data is not to enable slicing, but rather to enable interoperability with other libraries.

Supporting similar zero-cost general slices/view with sparse matrices is generally not possible - certainly not without significant changes to the CSR/CSC formats. The current formats for CSR/CSC are also very deliberate: the choices made so far ensure maximum compatibility with other libraries.

(This is a huge usability problem with for example Eigen's sparse matrices. Sometimes you have a reference &SparseMatrix and you need to pass in its data to an external library. However, if the matrix is not "compressed", it might not be accepted by an external library that assumes that the CSR/CSC data is contiguously stored. And Eigen's format is still too limited to support the kind of general slicing that you want to have.)

Are you sure you really need to have a slice/view and not just be able to copy out parts of a matrix? I have been thinking about a "selection API" along the following lines:

let selection = matrix.select(rows, cols).clone_owned();
output.select_mut(output_rows, output_cols).assign(&matrix.select(rows, cols));

I think at a top level how CsMatrix is handled should not change. As far as getting lanes, entries, etc. I think that API doesn't really need to care about what the underlying storage is as long as it can be used to implement those interfaces. So yeah, a CsStorage trait would be necessary, and probably look something like:
pub(crate) trait CsStorage<T> {
    fn pattern(&self) -> &SparsityPattern;

    fn values(&self) -> &[T];

    fn get_entry(&self, major_index: usize, minor_index: usize) -> Option<SparseEntry<'_, T>>;

    fn get_lane(&self, index: usize) -> Option<CsLane<'_, T>>;

    fn lane_iter(&self) -> CsLaneIter<'_, T>;
}

There are two different concerns with somewhat different requirements here: creating CSR/CSC matrices from borrowed data, and slicing. In both cases, however, the pattern itself needs to have borrowed data, otherwise you'd have to clone the entire sparsity pattern for creating a slice, for example. And moreover, we need to expose a well-defined low-level format that applications can work with. For example, for FEM assembly (which is a large part of my motivation for building nalgebra-sparse in the first place), we need to work with the CSR/CSC data directly at a low level to enable efficient, parallel workflows.

Indices and sizes should not be signed. So while I guess I'd agree that Dim is a bit different from what's desired here, there is no benefit to supporting signed types, and the nalgebra crate has made this explicit. It wouldn't, after all, make sense to add a value at index (-1, -1) in a matrix.

I personally agree, and the Rust ecosystem largely agrees. However, existing sparse linear algebra software generally does not. Indexing in C libraries is commonly signed, so we need to account for that in order to support interoperability with existing libraries (unfortunately).

I'm going to continue down the above path and submit a PR if I can accomplish slicing sparse matrices. Further discussion can continue in the meanwhile, but it'll be easier to demonstrate with actual code I reckon.

Sounds good - though for me there are a number of open questions we need to answer. However, perhaps these can be partially answered with the help of prototyping. I think the most pressing question, however, is how to proceed (if at all) with respect to slicing.

dimforge / nalgebra

CSR/CSC: Allow working with borrowed data + generalize index type #878