dimforge / nalgebra

Linear algebra library for Rust.
https://nalgebra.org
Apache License 2.0
3.95k stars 471 forks source link

RFC: Change ComplexField and related traits to improve generic programming #1149

Open Andlon opened 2 years ago

Andlon commented 2 years ago

This proposal suggests a breaking change that may impact many users of nalgebra in various ways, although I hope the impact will be minimal for the vast majority of users. I am looking for comments from the community.

Motivation

In nalgebra 0.29 if I'm not mistaken, Copy was removed from RealField. This is great because it opens up using nalgebra with non-copy types like arbitrary precision or similar. However, it also led to some severe regressions in the ergonomics of writing generic code. For example, consider the following code that would work in nalgebra prior to 0.29:

pub fn rotation_matrix<T: RealField>(theta: T) -> Matrix2<T> {
    matrix![ theta.cos(), -theta.sin();
             theta.sin(),  theta.cos() ]
}

This code no longer compiles, and would have to be replaced by the following code:

pub fn rotation_matrix<T: RealField>(theta: T) -> Matrix2<T> {
    matrix![ theta.clone().cos(), -theta.clone().sin();
             theta.clone().sin(),  theta.clone().cos() ]
}

The spurious cloning feels unnecessary, hurts readability and might even pessimize performance, especially for non-copy types where clones might not be optimized out by the compiler, such as arbitrary precision types.

Proposed solution

It would be great to be able to write the code in the first example again. To understand why the clones are necessary, let's take a look at a trimmed down version of the ComplexField trait:

pub trait ComplexField : ... {
  // ...
  fn cos(self) -> Self;
  fn sqrt(self) -> Self;
}

Each method takes self, which means that if Self is not Copy, we must clone in order to call any of these methods. I believe the reason this trait is designed in this way is that it mirrors the cos, sqrt etc. methods on f64 and f32. The reason that these methods on f32/f64 take self by value is that for primitive types, it tends to generate better assembly than references, though the differences may be negligible or non-existent out in optimized builds where the compiler is able to elide redundant stores/loads.

The obvious solution for fixing our generic predicament is naturally to change these methods to take self by reference, e.g.

pub trait ComplexField : ... {
  // ...
  fn cos(&self) -> Self;
  fn sqrt(&self) -> Self;
}

This would fix our initial generic code woes, and would be far more efficient for arbitrary precision types on top.

The same applies to all similar traits, including of course RealField.

There are a few potential drawbacks:

Analysis: Breaking changes

I don't think the breaking changes will be too severe, especially for one particular reason: These methods largely already exist on f32 and f64, so that when users are not writing generic code, they will anyway be calling the built-in methods directly on the primitive types.

If they're writing generic code, they'll already be writing e.g. x.cos() taking x by value. This would still compile if the method instead takes x by reference.

The change would break the code of users who are writing e.g. T::cos(x), in which case they'd have to update to T::cos(&x), though in many cases they might be writing T::cos(x.clone()), in which case I bet the change is welcome.

I welcome comments from users as to whether this change would be detrimental to them. It is a trade-off: we should not take breaking changes of this scale lightly, so we must carefully weigh the cost and benefits.

Analysis: Performance may suffer

Some initial tests on rust.godbolt.org suggests that in optimized builds the reference is often elided, if implemented inline. For example, consider this:

pub trait ComplexField {
  fn sqrt(&self) -> Self;
}

impl ComplexField for f64 {
  #[inline]
  fn sqrt(&self) -> Self {
    f64::sqrt(*self)
  }
}

pub fn test_sqrt(x: f64) -> f64 {
  <f64 as ComplexField>::sqrt(&x)
}

According to Compiler Explorer, test_sqrt pays no penalty for the reference. However, if #[inline] is removed, there may be an additional "move" instruction generated, especially in cross-crate usage although this cannot be directly observed on Godbolt.

Summary

As someone who writes a lot of generic code, the number of redundant clones has a tendency to reach ridiculous heights at times, and I'd sorely like to improve this. I've outlined a solution and two potential drawbacks. I'd like to solicit feedback from other users of nalgebra, and I think it's clear that we need some performance benchmarks to try to determine if real-world performance suffers from the proposed changes. Besides running nalgebra's own benchmarks, are there other benchmarks that can be used to try to get a good picture of real-world impact?

Ralith commented 2 years ago

This seems like a sensible solution to me. It's worth noting that e.g. the std Iterator API has a similar tendency to introduce silly references and rely on the optimizer to elide them.

sebcrozet commented 2 years ago

I’m not too concerned about performances here. It would be fine to mark all these methods #[inline(always)] since they are only thin wrappers anyway.

In terms of backward compatibility in non-generic code, using references can introduce some weirdness for #[no-std] crates. Say we define atan2(&self, rhs: &Self) -> Self:

let (a, b) = (1.0f32, 2.0f32);
(&a).atan2(&b); // Ok with or without `std`.
a.atan2(&b);      // Not ok with `std`, OK without `std`.

I think this is a good idea worth trying. I can check how negatively this impacts [no-std] code on parry and sparkl’s CUDA kernels.

sebcrozet commented 2 years ago

One other note about performances: switching to references prevents operations like .abs(), .sin(), etc. from re-using the same storage as self for non-Copy types. This implies that these method will always have to allocate to return their results, instead of modifying the input in-place. I’m not sure how many of these methods can be implemented in-place though.