Open Andlon opened 2 years ago
This seems like a sensible solution to me. It's worth noting that e.g. the std Iterator
API has a similar tendency to introduce silly references and rely on the optimizer to elide them.
I’m not too concerned about performances here. It would be fine to mark all these methods #[inline(always)]
since they are only thin wrappers anyway.
In terms of backward compatibility in non-generic code, using references can introduce some weirdness for #[no-std]
crates. Say we define atan2(&self, rhs: &Self) -> Self
:
let (a, b) = (1.0f32, 2.0f32);
(&a).atan2(&b); // Ok with or without `std`.
a.atan2(&b); // Not ok with `std`, OK without `std`.
I think this is a good idea worth trying.
I can check how negatively this impacts [no-std]
code on parry
and sparkl
’s CUDA kernels.
One other note about performances: switching to references prevents operations like .abs(), .sin()
, etc. from re-using the same storage as self
for non-Copy types. This implies that these method will always have to allocate to return their results, instead of modifying the input in-place. I’m not sure how many of these methods can be implemented in-place though.
This proposal suggests a breaking change that may impact many users of
nalgebra
in various ways, although I hope the impact will be minimal for the vast majority of users. I am looking for comments from the community.Motivation
In
nalgebra
0.29
if I'm not mistaken,Copy
was removed fromRealField
. This is great because it opens up usingnalgebra
with non-copy types like arbitrary precision or similar. However, it also led to some severe regressions in the ergonomics of writing generic code. For example, consider the following code that would work innalgebra
prior to0.29
:This code no longer compiles, and would have to be replaced by the following code:
The spurious cloning feels unnecessary, hurts readability and might even pessimize performance, especially for non-copy types where clones might not be optimized out by the compiler, such as arbitrary precision types.
Proposed solution
It would be great to be able to write the code in the first example again. To understand why the clones are necessary, let's take a look at a trimmed down version of the
ComplexField
trait:Each method takes
self
, which means that ifSelf
is notCopy
, we must clone in order to call any of these methods. I believe the reason this trait is designed in this way is that it mirrors thecos
,sqrt
etc. methods onf64
andf32
. The reason that these methods onf32
/f64
takeself
by value is that for primitive types, it tends to generate better assembly than references, though the differences may be negligible or non-existent out in optimized builds where the compiler is able to elide redundant stores/loads.The obvious solution for fixing our generic predicament is naturally to change these methods to take
self
by reference, e.g.This would fix our initial generic code woes, and would be far more efficient for arbitrary precision types on top.
The same applies to all similar traits, including of course
RealField
.There are a few potential drawbacks:
Analysis: Breaking changes
I don't think the breaking changes will be too severe, especially for one particular reason: These methods largely already exist on
f32
andf64
, so that when users are not writing generic code, they will anyway be calling the built-in methods directly on the primitive types.If they're writing generic code, they'll already be writing e.g.
x.cos()
takingx
by value. This would still compile if the method instead takesx
by reference.The change would break the code of users who are writing e.g.
T::cos(x)
, in which case they'd have to update toT::cos(&x)
, though in many cases they might be writingT::cos(x.clone())
, in which case I bet the change is welcome.I welcome comments from users as to whether this change would be detrimental to them. It is a trade-off: we should not take breaking changes of this scale lightly, so we must carefully weigh the cost and benefits.
Analysis: Performance may suffer
Some initial tests on rust.godbolt.org suggests that in optimized builds the reference is often elided, if implemented inline. For example, consider this:
According to Compiler Explorer,
test_sqrt
pays no penalty for the reference. However, if#[inline]
is removed, there may be an additional "move" instruction generated, especially in cross-crate usage although this cannot be directly observed on Godbolt.Summary
As someone who writes a lot of generic code, the number of redundant clones has a tendency to reach ridiculous heights at times, and I'd sorely like to improve this. I've outlined a solution and two potential drawbacks. I'd like to solicit feedback from other users of
nalgebra
, and I think it's clear that we need some performance benchmarks to try to determine if real-world performance suffers from the proposed changes. Besides runningnalgebra
's own benchmarks, are there other benchmarks that can be used to try to get a good picture of real-world impact?