Proposal: Provide nalgebra wrappers to `faer` factorizations

faer, a pure Rust library for performant low-level linear algebra spearheaded by @sarah-ek, provides dense matrix factorizations with remarkable performance.

Implementing truly performant matrix factorizations is a major undertaking that only a precious few individuals are capable of. nalgebra provides matrix factorizations of its own, but they are not undergoing active development other than bug fixes, and there are - to my knowledge - no plans to improve them further. I propose that we rally behind faer and try to find a way to leverage these exciting developments, so that users of nalgebra can enjoy the improved performance provided by faer.

Replacing nalgebra's matrix factorizations would be a non-trivial affair. nalgebra targets both fixed-size small dimensions and dynamically sized and potentially large matrices. For high performance, different algorithms need to be used for small and large dimensions. Determining the point at which to switch is something that needs to be done with great care. Moreover, managing buffers and finding a good way for the decomposition to contain different data structures depending on dimensions are further non-trivial design considerations.

Therefore, I suggest we instead start by providing idiomatic nalgebra wrappers to faer decompositions in a separate module.

Possible design

I propose that we provide a new module, nalgebra::faer, containing a separate struct for each relevant decomposition in faer. For example, faer's Cholesky decomposition may be named faer::FaerCholesky. I am not sure if the redundancy in naming is necessary, but it might help avoid confusion since at this point there will be multiple cholesky factorizations in nalgebra.

The individual APIs might look somewhat similar to the current APIs, so that it's easy to switch, but this also depends on what is appropriate for use with faer. I think we should experiment with this and see where it takes us.

I am not sure if we want to expose faer types in the public API, as this would mean that in the future, we may not be able to update faer internally without risking a breaking change for nalgebra. One way to accomplish this is to provide our own internal, sealed traits for data types, such as FaerRealField, FaerComplexField etc. These would use the corresponding faer traits as super traits (which would not be visible to users of nalgebra), but implemented for concrete types. On the other hand, I can see it being useful in some instances to be able to obtain the underlying faer decomposition directly, to offer the user more control. However, in that case the user can perhaps just work directly with faer instead.

Experimental features

Continual breaking changes in nalgebra has become a bit of a problem in the ecosystem, since Matrix data structures do not compose if libraries use different versions of nalgebra. I would personally like to see fewer breaking changes in nalgebra going forward. This is at odds with the perhaps experimental nature of this proposed feature, given that we are not quite sure about the design, and that the design may change over time as faer also changes.

I think this proposed feature might therefore be a good opportunity to try out something that we have only briefly discussed on the #nalgebra Discord.

It would be convenient if some parts of newly developing features could be exempt from semver. This might accelerate the pace of development of these features, and prevent breaking changes in nalgebra until we feel that we can commit to a particular design. One way to go about this is to introduce the concept of experimental features. These features would only be available under the experimental feature flag (so for the proposed faer integration, you would need to enable both faer and experimental), and the documentation for each experimental feature would clearly state that the feature is unstable, similar to how unstable features are documented in the Rust standard library.

To simplify managing experimental features, we could add an #[experimental] attribute macro to nalgebra_macros, maybe with a link to a tracking issue. For example:

#[experimental(issue = XXXX)]
mod faer;

which would produce very visible documentation along the lines of

This is an experimental feature. Tracking issue: #XXXX. Breaking changes in experimental features are not considered breaking changes for the crate, and must be enabled with the experimental feature flag.

This idea is based on Eigen's unsupported modules, although I'm not sure if we want to adopt the idea of unsupported code that is not actively maintained / tested and developed.

Summary

I propose that we try to provide idiomatic nalgebra wrappers for the matrix factorizations provided by faer under the faer feature flag, and also move to implement it as an experimental feature, a new concept that could help us develop new features over time without having to worry about breaking changes in nalgebra.

I've outlined only a vague and general plan for how this might look like. I think detailed design work should happen alongside implementation work.

Longer term, beyond what is proposed here, I hope we can also consider conditionally using faer for matrix factorizations by default. But this is a more involved issue, and it would be helpful to have the experience of developing the wrappers separately for now.

@sarah-ek: I'd love to hear your thoughts on a possible integration. Do you think a plan like what I've outlined makes sense? Do you foresee any problems, or can you think of a better design/plan?

Hi,

we have been playing around with linear algebra interfaces in our own experimental Rust Linear solver Toolbox (https://github.com/linalg-rs/rlst), which we hope to have a first release for later this year.

We want to eventually support faer and Lapack equally. Our proposed structure is as follows. For example to call an SVD from a matrix mat you call

mat.linalg().svd()

The linalg method is from a LinAlg trait that for dense matrices builds a struct DenseMatrixLinAlgBuilder. The svd is now another trait that is implemented for DenseMatrixLinAlgBuilder and internally for Lapack copies the object and then calls the Lapack routine.

The idea of this is that we can hide behind a feature flag whether we want a Lapack implementation of the Svd trait, a faer implementation or something else. This is now a compile time choice. Making this a compile time choice is important for other methods who depend on the svd, e.g. the implementation of the Norm2 trait is done for all objects that implement the LinAlg trait and whose output itself implements the Svd trait. The Norm2 trait now calls the svd and returns the largest singular value as norm (if it sees that the matrix is just a vector it directly computes the vector norm instead).

We played around with a lot of ways of doing it and hiding faer and Lapack implementations. This seems for us the most workable so far and additionally allows to abstract over different matrix like objects through the LinAlg trait (e.g. treat sparse matrices completely differently from dense matrices, etc.)

i like the idea of leaving it as an experimental feature for now, since faer is relatively new as a library and the api might still change in the future. overall, the plan looks solid to me

Thanks for your input @tbetcke, really appreciate it!

What is the rationale for having the intermediate LinAlg trait instead of, say, an SVD trait directly on the matrix? It's not quite clear to me what the purpose of the intermediate DenseMatrixLinAlgBuilder struct is.

The idea of this is that we can hide behind a feature flag whether we want a Lapack implementation of the Svd trait, a faer implementation or something else.

While I understand this design goal, my previous experience with this kind of design has been catastrophic. LAPACK does not have any notion of local configuration of parallelism, i.e. you can not control the degree of parallelism per call. This is a severe problem if, in the same application, you need both:

to solve a single "large" matrix problem in parallel
to solve many small matrix problems separately in parallel

I've had exactly this issue before: on the one hand I needed to solve a single global system in parallel, but in order to build the matrix in the first place I needed to solve many small systems of varying size individually in parallel. In the end I had to resort to something else than LAPACK for the smaller matrix solves.

Regardless, I do agree that there is potentially a lot of value in abstracting the backend. But I think unifying faer - which thankfully supports configuration of parallelism at the call site - and LAPACK under a single abstraction is very deeply problematic for the aforementioned reason. However, you have likely considered this, and perhaps you have found some kind of solution that is not obvious to me at this time?

Using feature flags for backend selection is a topic that I remember discussing with some of the ndarray authors already back in 2017 or so. There several additionally issues that I wonder if you have solved in a satisfactory manner. The most pressing, I believe, are:

Feature flags should not be mutually exclusive. This leads to two alternatives:
- Ignore this and decide categorically that only the end user should ever set a single of these feature flags.
  - If the linear algebra library is only used deep in a hierarchy of libraries, the end-user might not even have a clue that the library is used, which might make it difficult to put the decision on the user.
- Discourage library authors from setting any of these feature flags but allow different backend flags at the same time (such as faer and lapack), but have a principled manner of making a final decision.
  - This is also suboptimal because if a user enables, say, faer, a different crate might already have enabled lapack. If lapack is preferred over faer, the library would still use faer even though the user selected faer, which is more than a little confusing.
If the end-user should set the feature flag, then there is the question of how, and there are essentially two options that I'm aware of:
- Every dependency that transitively depends on the linear algebra library has to propagate the necessary feature flags (e.g. faer, lapack), which might get a bit unwieldy and relies on every single library author in the chain to actually do this.
- Have the end-user depend on the linear algebra library and set the desired feature. However, this is very brittle because it requires the end user to select exactly the same version as the version of the linear algebra library used possibly deep in the dependency hierarchy, otherwise you might just end up with a duplicated dependency, and the wrong backend is selected by the linear algebra library. Possibly without any warning.

An alternative to features is to use environment variables to control the backend instead. I haven't considered this in detail, but it potentially resolves some of the aforementioend problems, though it might have new problems of its own.

I'm curious if you considered all these issues and still opted for compile time features to select backends. I'm not personally convinced that this is altogether a great solution, but I'd be happy to learn more!

Long-term we might definitely want to provide some kind of high-level abstraction, but I think it is very difficult to come up with an abstraction that offers sufficiently granular control of parallelism. Would be great to have a continued discussion on this though, and I'd like to follow your progress on rlst.

For the time being, in any case, I think it would be great to come up with an initial experimental integration of faer that is independent in the sense that other APIs are not impacted. That way nalgebra users can already start to enjoy the performance benefits of faer. Then once we feel more confident about what we need, we might revisit the idea of abstracting backends.

i like the idea of leaving it as an experimental feature for now, since faer is relatively new as a library and the api might still change in the future. overall, the plan looks solid to me

Great to hear, thanks for chiming in!

I likely won't have time to work on this myself any time soon. If anyone is willing to take a stab, that would be most appreciated. Please do announce yourself here in the issue, however, so we avoid unintentionally duplicating efforts!

@Andlon The rationale of the LinAlg trait is two-fold:

1.) Depending on the type of the matrix it can give out different LinAlgBuilder objects. Right now we have the DenseMatrixLinalgBuilder but will soon add a SparseMatrixLinalgBuilder which the LinAlg trait will output when the linalg method is called for a sparse matrix.

2.) In the future data may be available on accelerator devices, etc. The LinAlg builder allows to inject additional fetch operations/checks, etc. before the actual linear algebra routines are called.

The problem of switching threading types is solved for now by us through defaulting to Blis as Blas provider. Blis has interface routines that allow during runtime to change the threading model (e.g. single/multi, etc.). We need this a lot in our codes.

Regarding faer integration. We haven't done this yet. But the idea is that the specific traits such as SVD, etc. are implemented for the DenseMatrixBuilder. So we can just switch around via feature flag the trait implementation from Lapack to faer. The disadvantage is that feature flags in this case are not composable. The user needs to choose one or the other which contradicts composability of features a bit. But difficult to achieve both.

are there updates on this topic ?

are there updates on this topic ?

No, as far as I know, nobody is actively working on this. Though, I have been thinking about it.

dimforge / nalgebra