Tehforsch / diman

Define rust compile time unit systems using const generics
52 stars 2 forks source link

Unit ergonomics #56

Closed Tehforsch closed 6 months ago

Tehforsch commented 8 months ago

In the future I'd like if instead of generating the constructors Length::meters, Time::seconds, ... as well as the conversion functions in_meters, in_seconds, in_joules for every unit that exists, we'd have a syntax (loosely inspired by https://github.com/aurora-opensource/au and https://github.com/mpusz/mp-units)

that reads something like

let l: Length = Meters(10.0);
let in_km: f64 = l.in(Kilometers);

I think there are a number of benefits to a syntax like this:

  1. It is shorter and looks cleaner to me than the constructors / conversions we currently have
  2. I assume it cuts down compile times, because the in function would only have to be generated once and is then monomorphized for with which it is used, which I (blindly, I should say) assume should be faster than generating the methods whether the unit is used or not. This might not be true if in is the only method that exists, but I assume there will be more methods like it in the future.
  3. It might be extendable to eventually do let vel_float: f64 = vel.in(Kilometer / Hour) which would be very nice. However, that is pretty futuristic at this point, since this requires - at the very least - const_trait_impl (https://github.com/rust-lang/rust/issues/67792) but in order to do this without any overhead, it might also require even more machinery that is futuristic at this point.

In order to implement this, I'd add ZSTs for each unit (i.e. struct Meter) which then implement a unit trait like

trait Unit { 
    const DIM: Dimension,
    const FACTOR: f64,
}

The in method would then be implemented (roughly, the real implementation isnt quite as simple, since we need to ensure the proper dimensions) as

impl<...> Quantity<S, ...> {
    fn in<U: Unit>(&self, _: U) -> S { 
        (self / U::FACTOR).value_unchecked()
    }
}

I played around with this and found that a few things about this are not possible exactly like I want them at the moment.

  1. We can't call the ZST directly (https://github.com/rust-lang/rust/issues/29625), so it would have to be more like let l = Meters::new(10.0); or alternatively let l = meters(10.0) where meters is a function we'd need to generate.
  2. We can't name the method in since thats a reserved keyword. Sounds trivial, but I do think that unfortunately severely reduces readability, since the alternative would be something like let in_km = l.in_unit(Kilometer); which is not as nice and doesn't really improve on l.in_kilometers() which is the current option.

Because of these concerns, I don't know that this would be an improvement over the current syntax, as long as the constructors and in_... are the only conversion / unit-specific methods on Quantity.

Tehforsch commented 8 months ago

I wrote down some thoughts about how I'd like to modify the current syntax / API. If you have the time for it, I'd really appreciate your thoughts @jedbrown - I don't want to change things up if the result ends up being worse than what we had before.

Unit Syntax

It might be extendable to eventually do let vel_float: f64 = vel.in(Kilometer / Hour) which would be very nice. However, that is pretty futuristic at this point, since this requires - at the very least - const_trait_impl ( https://github.com/rust-lang/rust/issues/67792) but in order to do this without any overhead, it might also require even more machinery that is futuristic at this point.

So after playing with this a little more I found out that there is a different aproach in which this actually works if implemented correctly. Will require a bit of magic to set it up properly, since the system needs to be able to track unit magnitudes at compile time (without const_float_arithmetic), but I wrote a basic proof of concept and integrating it in diman shouldn't be too much work, especially with #68, which I'd probably do first.

So what I am thinking of is taking this opportunity to really think about what kind of syntax/organization structure we want, because I'd prefer to get it right now and not to have to change it (too much) in the future, because that's a pretty breaking change.

In my playground crate, I currently implemented it so that the following example compiles:

use crate::units::{Meter, Second, s, m, Kilometer, Hour, Mile};

pub fn main() {
    let vel1 = 1.0 * m / s;
    let vel2 = (Meter / Second)::new(2.0);

    println!("{} m/s", vel1.value_in(Kilometer / Hour));
    println!("Rounded: {} mph", vel2.round_in(Mile / Hour));
}

I personally find this syntax really clear and nice to read (it's a shame it's value_in and not in but ok. I'll survive). New quantities can either be constructed by just multiplying them with the appropriate units, or alternatively by constructing the unit and then calling ::new on it. I am very glad this lets us delete the (thousands of) automatically generated methods as well.

The unit math is all evaluated at compile time, so no actual multiplication should be taking place at run time for the Kilometer / Hour expression. I'd still like to make sure by looking at the actual assembly at some point (there really is no code for it to execute at runtime, but you never know?).

On a side note, I think this will also lay the groundwork for heavier compile-time analysis. For example, I think we might be able to analyze whether certain unit conversions are safe, but I haven't checked this.

Symbols: storage-type specific or not?

In the above code, symbols and units mean exactly the same thing. However, in some cases, the compiler doesn't do perfect type inference and a type annotation for the storage type of the quantity will be required via something like let velocity = 1.0f64 * m / s. To prevent having to do this, I was thinking about differentiating between symbols (m, s) and the units (Meter, Second) and using the symbols as "dedicated constructors" by making them storage type specific, so that you'd do:

use crate::units::{Meter, Second, Kilometer, Hour, Mile};
use crate::units::f32::{s, m};

pub fn main() {
    let vel1 = 1.0 * m / s;
    let vel2 = (Meter / Second)::new(2.0);

    println!("{} m/s", vel1.value_in(Kilometer / Hour));
    println!("Rounded: {} mph", vel2.round_in(Mile / Hour));
}

while I'd keep the units for the storage type agnostic methods such as value_in and round_in.

I am not sure if this simplifies or complicates things for the user. The alternative I see would be to consider symbols and units the same thing and simply require a type annotation here and there.

Dimensions: storage-type specific or not?

At the same time I am also reconsidering what the best way to define the dimensions is. The way I see it there are three options:

  1. Only storage type specific dimensions exist (f32::Length, f64::Length, ...)
  2. Only storage type agnostic dimensions exist (Length<f32>, Length<f64>)
  3. Both exist. The user decides which one to use by importing them (
    use dimensions::Length;
    use dimensions::f64::Time;
    fn foo(l: Length<f64>, t: Time) {
        ...
    }

Personally, I'd prefer option three and simply giving the user the choice, but I can also see an argument against it - there are suddenly many different types called Length and some of them have a generic, others don't. That could take some getting used to.

Module structure

As a last thing, I'd like to think about the structure of the macro output from unit_system. Currently, exported items are all contained in one large module with submodules for each storage type. The alternative would be to put each type of item in a submodule so that we'd import things like:

use diman::si::dimensions::{Velocity, Time};
use diman::si::dimensions::f32::{Length};
use diman::si::units::{Meter, Second};
use diman::si::symbols::f32::{m, s}; // Provided we'd implement it this way
use diman::si::constants::PROTON_MASS;

I personally prefer this and find it cleaner, but I guess it might be annoyingly busy for some people.

jedbrown commented 8 months ago

Neat ideas, and I'd be curious to play with an experimental branch. My subjective thoughts:

Unit syntax

I like (Meter / Second)::new(2.0) as an unambiguous syntax. One downside is that it's unwieldy if fully qualified (diman::si::units::Meter / diman::si::units::Second)::new(2.0), versus diman::si::Velocity::meters_per_second(2.0) and perhaps harder to see what the named type needs to be (e.g., when it'll go into a struct). I think most users will import, so this is okay, and others may may contain it in something like this

mod gauge {
    use diman::si::units::{Meter, Second};
    pub type Velocity = Meter / Second;
}
// then
let v = gauge::Velocity::new(2.0);

I'm a bit less enthusiastic about the let v = 1.0 * m / s syntax for serious projects, but it's undoubtedly nice for exploration.

Storage-type specific?

I don't feel any pain from let v = 1.0_f64 * m / s or (Meter / Second)::new(2.0_f32) as needed for type inference. I would like to frequently be generic over F: Float and would rather not fight the module structure (so prefer Length<F>). I don't want to push this preference on others, but I don't think I'm alone in it. Your option of supporting both is probably okay, but it may take some experience using it in bigger projects to really feel whether it's worth it.

In the above, how can you make PROTON_MASS not storage-type specific?

Module structure

It looks sort of tedious/a lot to remember as written and rust-analyzer won't know which of the several submodules to get Length from (so more choices). However, it may make wildcare use statements safe enough, and that would be nice for short code that is dimensionally rich.

Tehforsch commented 8 months ago

Thanks for your input!

I'd be curious to play with an experimental branch

I should get one up soon, am currently in the process of integrating this new implementation into the macro, but it'll be a bit of work.

I like (Meter / Second)::new(2.0) as an unambiguous syntax. One downside is that it's unwieldy if fully qualified (diman::si::units::Meter / diman::si::units::Second)::new(2.0), versus diman::si::Velocity::meters_per_second(2.0)

Note that there would still be diman::si::units::MetersPerSecond::new(2.0), even with the new syntax, if that is defined as a unit. Being able to do call ::new on expressions is more of a bonus.

perhaps harder to see what the named type needs to be (e.g., when it'll go into a struct).

That's true. The meters_per_second method had a much clearer interface. Thankfully, the error messages on dimension mismatches are still really nice:

error[E0308]: mismatched types
   --> src/main.rs:188:21
    |
188 |     let l: Length = (Kilometer / Hour).new(50.0);
    |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `Dimension { length: 1, time: 0 }`, found `Dimension { length: 1, time: -1 }`
    |
    = note: expected constant `Dimension { length: 1, time: 0 }`
               found constant `Dimension { length: 1, time: -1 }`

So I suppose that if the user gets the dimension wrong in a struct, they'd still see something like this to hopefully help them out. Note that this:

mod gauge {
    use diman::si::units::{Meter, Second};
    pub type Velocity = Meter / Second;
}
// then
let v = gauge::Velocity::new(2.0);

unfortunately cannot work at the moment. The dark secret that I haven't mentioned yet is that Meter, Second and so on are actually constants pretending to be types in an abuse of notation (because I want them to feel like types, since that is much closer to what they actually are, and because I don't like being screamed at too much by my code).

I'll write down why here, mainly so I can read it when I inevitably come back to this in a few months, wondering why I didn't implement it differently:

  1. The first option that clearly doesn't go anywhere is adding a bunch of ZSTs like pub struct Meter; and implementing Mul and Div on them. That would require O(n^2) trait impls, so thats out of the question.
  2. Then my immediate thought was to add a trait trait Unit that each of the ZSTs would implement which defines their dimensions and magnitude and to write something like
    impl<SL, SR> Mul<SR> for SL where SL: Unit, SR: Unit { 
        ...
    }

    However, the orphan rule won't allow that, since Mul is not a diman-defined trait and SL and SR are completely generic. It's too bad you can't declare a trait as "so private no type other than the ones I strictly control will ever implement it, and those don't overlap", but thats how it is.

  3. The next idea, which almost works is to implement a type struct Unit<const D: Dimension, const M: Magnitude>, where S is some ZST corresponding to the unit and to implement Mul/Div for that and all kinds of combinations of it and f32 / f64, ... The unfortunate thing is that if I write pub type Meter = Unit<{Dimension { length: 1, ...} }, {...}>;, I don't get to writelet x = Meter::new(10.0)`, since type aliases can't be constructors.
  4. That's why I ended up with the only design that I could get to work for now: Using the Unit type above, write constant Meter: Unit<{Dimension { length: 1, ... }}, {...}> = Unit;. Note that the constant doesn't even contain any runtime value, so it is really just about notation.

All of this said, pub type A = B / C will probably never work, since that would require Div on a type level - that's why I added the Product / Quotient aliases to diman, so you could do pub type A = Quotient<B, C> (note that this just does <B as Div<C>>::Output).

I am not incredibly happy about the units being "constants". I think that the main source of confusion regarding this might be if somebody mistakes Meter for an actual type and writes something like struct S(Meter). However, I am still much happier with this design than with the current one and any other that I could come up with so far.

I'm a bit less enthusiastic about the let v = 1.0 * m / s syntax for serious projects, but it's undoubtedly nice for exploration.

That's a fair point. I'm a little worried about namespace conflicts here, since already m and s can be quite commonly used as variables. I think adding symbols (even just as aliases for the generic units) is probably not worth it.

I don't feel any pain from let v = 1.0_f64 * m / s or (Meter / Second)::new(2.0_f32) as needed for type inference. I would like to frequently be generic over F: Float and would rather not fight the module structure (so prefer Length). I don't want to push this preference on others, but I don't think I'm alone in it.

I don't think you're alone in it either. I actually regret using fully qualified aliases - initially I wrote diman for use in a code where I could only use f64 anyways and I went for maximum convenience for myself, but as a default I find Length<f64> clearer and the little bit of additional typing is not so bad. I guess if the user is forced to write the storage type anyways, it will also help with some of the type inference problems. It'll also help to finally get rid of the undocumented and badly implemented gen-vec-names feature because writing Length<Vec3> is much nicer anyways.

I think for now, I'll change it to having Length<S> and leave the additional complexity aside. In the future, we could think about adding either yet another feature that generates storage-type-specific dimensions for users that want it.

In the above, how can you make PROTON_MASS not storage-type specific?

I think it should be possible to implement it exactly like the units, except this time it is a properly named constant. The value of the constant would be stored in the Magnitude of the corresponding unit. So you'd be able to do let m: Mass<f64> = PROTON_MASS * 5.0 and that should work just fine.

It looks sort of tedious/a lot to remember as written and rust-analyzer won't know which of the several submodules to get Length from (so more choices). However, it may make wildcare use statements safe enough, and that would be nice for short code that is dimensionally rich.

The first I agree with. If only storage-type generic dimensions exists, then it also help with rust-analyzers auto import, because Length exists only once. Also, if we don't add symbols we could clean this up a little. That would leave only three modules (dimensions, units and constants). I slightly prefer the submodules over one gigantic namespace, but I'll give myself a little more time to think about it.

One last comment - since floats are annoying and const_float_arithmetic is not yet stable, I cannot guarantee that unit constructors are completely zero-cost (that is, one multiplication) without const_float_arithmetic enabled. I am fairly sure the compiler will make it zero-cost anyways, but without investigation I can't guarantee it. I think I will be able to write this in a way where if the user has the feature enabled, all the unit constructors will be written differently and guaranteed to be zero-cost. It's too bad that it's impossible to do const-work on some type of NonNanF64 for these cases where I can guarantee that the floats won't become NaN, but thats just how it is right now. My impression is that const_float_arithmetic is unstable for some fairly theoretical reasons (that are still reasonable considering Rusts mission) and many users would probably be willing to opt into the feature. In general, I'd like to err on the side of "slightly experimental but more powerful" for now, since this is already a nightly-only library and using this type of expression feels in line with that.

jedbrown commented 8 months ago

Thanks for writing this out. There has been some recent Zulip discussion on const float generics. I think clear use cases would be helpful, but the main hangup looks to be signed zero and NaN.

The gauge idea can perhaps be better expressed as a static case of #35, but I suspect something along those lines will be useful when working with physical sensors.

Tehforsch commented 8 months ago

There has been some recent Zulip discussion on const float generics. I think clear use cases would be helpful, but the main hangup looks to be signed zero and NaN.

Thanks for the link, I added a comment for our use case there.