Closed Tehforsch closed 6 months ago
I wrote down some thoughts about how I'd like to modify the current syntax / API. If you have the time for it, I'd really appreciate your thoughts @jedbrown - I don't want to change things up if the result ends up being worse than what we had before.
It might be extendable to eventually do let vel_float: f64 = vel.in(Kilometer / Hour) which would be very nice. However, that is pretty futuristic at this point, since this requires - at the very least - const_trait_impl ( https://github.com/rust-lang/rust/issues/67792) but in order to do this without any overhead, it might also require even more machinery that is futuristic at this point.
So after playing with this a little more I found out that there is a different aproach in which this actually works if implemented correctly. Will require a bit of magic to set it up properly, since the system needs to be able to track unit magnitudes at compile time (without const_float_arithmetic
), but I wrote a basic proof of concept and integrating it in diman shouldn't be too much work, especially with #68, which I'd probably do first.
So what I am thinking of is taking this opportunity to really think about what kind of syntax/organization structure we want, because I'd prefer to get it right now and not to have to change it (too much) in the future, because that's a pretty breaking change.
In my playground crate, I currently implemented it so that the following example compiles:
use crate::units::{Meter, Second, s, m, Kilometer, Hour, Mile};
pub fn main() {
let vel1 = 1.0 * m / s;
let vel2 = (Meter / Second)::new(2.0);
println!("{} m/s", vel1.value_in(Kilometer / Hour));
println!("Rounded: {} mph", vel2.round_in(Mile / Hour));
}
I personally find this syntax really clear and nice to read (it's a shame it's value_in
and not in
but ok. I'll survive).
New quantities can either be constructed by just multiplying them with the appropriate units, or alternatively by constructing the unit and then calling ::new
on it. I am very glad this lets us delete the (thousands of) automatically generated methods as well.
The unit math is all evaluated at compile time, so no actual multiplication should be taking place at run time for the Kilometer / Hour
expression. I'd still like to make sure by looking at the actual assembly at some point (there really is no code for it to execute at runtime, but you never know?).
On a side note, I think this will also lay the groundwork for heavier compile-time analysis. For example, I think we might be able to analyze whether certain unit conversions are safe, but I haven't checked this.
In the above code, symbols and units mean exactly the same thing.
However, in some cases, the compiler doesn't do perfect type inference and a type annotation for the storage type of the quantity will be required via something like
let velocity = 1.0f64 * m / s
.
To prevent having to do this, I was thinking about differentiating between symbols (m
, s
) and the units (Meter
, Second
) and using the symbols as "dedicated constructors" by making them storage type specific, so that you'd do:
use crate::units::{Meter, Second, Kilometer, Hour, Mile};
use crate::units::f32::{s, m};
pub fn main() {
let vel1 = 1.0 * m / s;
let vel2 = (Meter / Second)::new(2.0);
println!("{} m/s", vel1.value_in(Kilometer / Hour));
println!("Rounded: {} mph", vel2.round_in(Mile / Hour));
}
while I'd keep the units for the storage type agnostic methods such as value_in
and round_in
.
I am not sure if this simplifies or complicates things for the user. The alternative I see would be to consider symbols and units the same thing and simply require a type annotation here and there.
At the same time I am also reconsidering what the best way to define the dimensions is. The way I see it there are three options:
f32::Length
, f64::Length
, ...) Length<f32>
, Length<f64>
)use dimensions::Length;
use dimensions::f64::Time;
fn foo(l: Length<f64>, t: Time) {
...
}
Personally, I'd prefer option three and simply giving the user the choice, but I can also see an argument against it - there are suddenly many different types called Length
and some of them have a generic, others don't. That could take some getting used to.
As a last thing, I'd like to think about the structure of the macro output from unit_system
. Currently, exported items are all contained in one large module with submodules for each storage type. The alternative would be to put each type of item in a submodule so that we'd import things like:
use diman::si::dimensions::{Velocity, Time};
use diman::si::dimensions::f32::{Length};
use diman::si::units::{Meter, Second};
use diman::si::symbols::f32::{m, s}; // Provided we'd implement it this way
use diman::si::constants::PROTON_MASS;
I personally prefer this and find it cleaner, but I guess it might be annoyingly busy for some people.
Neat ideas, and I'd be curious to play with an experimental branch. My subjective thoughts:
I like (Meter / Second)::new(2.0)
as an unambiguous syntax. One downside is that it's unwieldy if fully qualified (diman::si::units::Meter / diman::si::units::Second)::new(2.0)
, versus diman::si::Velocity::meters_per_second(2.0)
and perhaps harder to see what the named type needs to be (e.g., when it'll go into a struct). I think most users will import, so this is okay, and others may may contain it in something like this
mod gauge {
use diman::si::units::{Meter, Second};
pub type Velocity = Meter / Second;
}
// then
let v = gauge::Velocity::new(2.0);
I'm a bit less enthusiastic about the let v = 1.0 * m / s
syntax for serious projects, but it's undoubtedly nice for exploration.
I don't feel any pain from let v = 1.0_f64 * m / s
or (Meter / Second)::new(2.0_f32)
as needed for type inference. I would like to frequently be generic over F: Float
and would rather not fight the module structure (so prefer Length<F>
). I don't want to push this preference on others, but I don't think I'm alone in it. Your option of supporting both is probably okay, but it may take some experience using it in bigger projects to really feel whether it's worth it.
In the above, how can you make PROTON_MASS
not storage-type specific?
It looks sort of tedious/a lot to remember as written and rust-analyzer won't know which of the several submodules to get Length
from (so more choices). However, it may make wildcare use
statements safe enough, and that would be nice for short code that is dimensionally rich.
Thanks for your input!
I'd be curious to play with an experimental branch
I should get one up soon, am currently in the process of integrating this new implementation into the macro, but it'll be a bit of work.
I like (Meter / Second)::new(2.0) as an unambiguous syntax. One downside is that it's unwieldy if fully qualified (diman::si::units::Meter / diman::si::units::Second)::new(2.0), versus diman::si::Velocity::meters_per_second(2.0)
Note that there would still be diman::si::units::MetersPerSecond::new(2.0)
, even with the new syntax, if that is defined as a unit. Being able to do call ::new
on expressions is more of a bonus.
perhaps harder to see what the named type needs to be (e.g., when it'll go into a struct).
That's true. The meters_per_second
method had a much clearer interface. Thankfully, the error messages on dimension mismatches are still really nice:
error[E0308]: mismatched types
--> src/main.rs:188:21
|
188 | let l: Length = (Kilometer / Hour).new(50.0);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `Dimension { length: 1, time: 0 }`, found `Dimension { length: 1, time: -1 }`
|
= note: expected constant `Dimension { length: 1, time: 0 }`
found constant `Dimension { length: 1, time: -1 }`
So I suppose that if the user gets the dimension wrong in a struct, they'd still see something like this to hopefully help them out. Note that this:
mod gauge {
use diman::si::units::{Meter, Second};
pub type Velocity = Meter / Second;
}
// then
let v = gauge::Velocity::new(2.0);
unfortunately cannot work at the moment. The dark secret that I haven't mentioned yet is that Meter
, Second
and so on are actually constants pretending to be types in an abuse of notation (because I want them to feel like types, since that is much closer to what they actually are, and because I don't like being screamed at too much by my code).
I'll write down why here, mainly so I can read it when I inevitably come back to this in a few months, wondering why I didn't implement it differently:
pub struct Meter;
and implementing Mul
and Div
on them. That would require O(n^2) trait impls, so thats out of the question.trait Unit
that each of the ZSTs would implement which defines their dimensions and magnitude and to write something like
impl<SL, SR> Mul<SR> for SL where SL: Unit, SR: Unit {
...
}
However, the orphan rule won't allow that, since Mul
is not a diman-defined trait and SL
and SR
are completely generic. It's too bad you can't declare a trait as "so private no type other than the ones I strictly control will ever implement it, and those don't overlap", but thats how it is.
struct Unit<const D: Dimension, const M: Magnitude>
, where S
is some ZST corresponding to the unit and to implement Mul
/Div
for that and all kinds of combinations of it and f32
/ f64
, ... The unfortunate thing is that if I write pub type Meter = Unit<{Dimension { length: 1, ...} }, {...}>;, I don't get to write
let x = Meter::new(10.0)`, since type aliases can't be constructors. Unit
type above, write constant Meter: Unit<{Dimension { length: 1, ... }}, {...}> = Unit;
. Note that the constant doesn't even contain any runtime value, so it is really just about notation.All of this said, pub type A = B / C
will probably never work, since that would require Div
on a type level - that's why I added the Product
/ Quotient
aliases to diman, so you could do pub type A = Quotient<B, C>
(note that this just does <B as Div<C>>::Output
).
I am not incredibly happy about the units being "constants". I think that the main source of confusion regarding this might be if somebody mistakes Meter
for an actual type and writes something like struct S(Meter)
. However, I am still much happier with this design than with the current one and any other that I could come up with so far.
I'm a bit less enthusiastic about the let v = 1.0 * m / s syntax for serious projects, but it's undoubtedly nice for exploration.
That's a fair point. I'm a little worried about namespace conflicts here, since already m
and s
can be quite commonly used as variables. I think adding symbols (even just as aliases for the generic units) is probably not worth it.
I don't feel any pain from let v = 1.0_f64 * m / s or (Meter / Second)::new(2.0_f32) as needed for type inference. I would like to frequently be generic over F: Float and would rather not fight the module structure (so prefer Length
). I don't want to push this preference on others, but I don't think I'm alone in it.
I don't think you're alone in it either. I actually regret using fully qualified aliases - initially I wrote diman for use in a code where I could only use f64
anyways and I went for maximum convenience for myself, but as a default I find Length<f64>
clearer and the little bit of additional typing is not so bad. I guess if the user is forced to write the storage type anyways, it will also help with some of the type inference problems. It'll also help to finally get rid of the undocumented and badly implemented gen-vec-names
feature because writing Length<Vec3>
is much nicer anyways.
I think for now, I'll change it to having Length<S>
and leave the additional complexity aside. In the future, we could think about adding either yet another feature that generates storage-type-specific dimensions for users that want it.
In the above, how can you make PROTON_MASS not storage-type specific?
I think it should be possible to implement it exactly like the units, except this time it is a properly named constant. The value of the constant would be stored in the Magnitude
of the corresponding unit. So you'd be able to do let m: Mass<f64> = PROTON_MASS * 5.0
and that should work just fine.
It looks sort of tedious/a lot to remember as written and rust-analyzer won't know which of the several submodules to get Length from (so more choices). However, it may make wildcare use statements safe enough, and that would be nice for short code that is dimensionally rich.
The first I agree with. If only storage-type generic dimensions exists, then it also help with rust-analyzers auto import, because Length
exists only once.
Also, if we don't add symbols we could clean this up a little.
That would leave only three modules (dimensions
, units
and constants
). I slightly prefer the submodules over one gigantic namespace, but I'll give myself a little more time to think about it.
One last comment - since floats are annoying and const_float_arithmetic
is not yet stable, I cannot guarantee that unit constructors are completely zero-cost (that is, one multiplication) without const_float_arithmetic
enabled. I am fairly sure the compiler will make it zero-cost anyways, but without investigation I can't guarantee it.
I think I will be able to write this in a way where if the user has the feature enabled, all the unit constructors will be written differently and guaranteed to be zero-cost.
It's too bad that it's impossible to do const-work on some type of NonNanF64
for these cases where I can guarantee that the floats won't become NaN, but thats just how it is right now. My impression is that const_float_arithmetic
is unstable for some fairly theoretical reasons (that are still reasonable considering Rusts mission) and many users would probably be willing to opt into the feature.
In general, I'd like to err on the side of "slightly experimental but more powerful" for now, since this is already a nightly-only library and using this type of expression feels in line with that.
Thanks for writing this out. There has been some recent Zulip discussion on const float generics. I think clear use cases would be helpful, but the main hangup looks to be signed zero and NaN.
The gauge
idea can perhaps be better expressed as a static case of #35, but I suspect something along those lines will be useful when working with physical sensors.
There has been some recent Zulip discussion on const float generics. I think clear use cases would be helpful, but the main hangup looks to be signed zero and NaN.
Thanks for the link, I added a comment for our use case there.
In the future I'd like if instead of generating the constructors
Length::meters
,Time::seconds
, ... as well as the conversion functionsin_meters
,in_seconds
,in_joules
for every unit that exists, we'd have a syntax (loosely inspired by https://github.com/aurora-opensource/au and https://github.com/mpusz/mp-units)that reads something like
I think there are a number of benefits to a syntax like this:
in
function would only have to be generated once and is then monomorphized for with which it is used, which I (blindly, I should say) assume should be faster than generating the methods whether the unit is used or not. This might not be true ifin
is the only method that exists, but I assume there will be more methods like it in the future.let vel_float: f64 = vel.in(Kilometer / Hour)
which would be very nice. However, that is pretty futuristic at this point, since this requires - at the very least -const_trait_impl
(https://github.com/rust-lang/rust/issues/67792) but in order to do this without any overhead, it might also require even more machinery that is futuristic at this point.In order to implement this, I'd add ZSTs for each unit (i.e.
struct Meter
) which then implement a unit trait likeThe
in
method would then be implemented (roughly, the real implementation isnt quite as simple, since we need to ensure the proper dimensions) asI played around with this and found that a few things about this are not possible exactly like I want them at the moment.
let l = Meters::new(10.0);
or alternativelylet l = meters(10.0)
wheremeters
is a function we'd need to generate.in
since thats a reserved keyword. Sounds trivial, but I do think that unfortunately severely reduces readability, since the alternative would be something likelet in_km = l.in_unit(Kilometer);
which is not as nice and doesn't really improve onl.in_kilometers()
which is the current option.Because of these concerns, I don't know that this would be an improvement over the current syntax, as long as the constructors and
in_...
are the only conversion / unit-specific methods on Quantity.