Open apps4uco opened 1 year ago
Thanks for the bug report. I'm planning on doing them eventually, though I haven't had the need yet.
There are also some non-trivial design questions. For example, should i4::new(-1).raw_value()
return -1i8
or 0b1111
? The former would be more useful for arithmetics where-as the second would be more useful to use in bitfields.
Maybe the answer is two functions? raw_value()
could return 0b1111
and .value()
(or .into()
) would return -1i8.
I would go for two functions too (but the inner value will still need to be one of both, sounds like something we could maybe benchmark or just use the easier one). Hm, though -1i8 as u8 == 0b11111111... u4::from(0b1111)... honestly would need some prototyping together with bitfields
It would be nice if .raw_value()
would just be the bits, without sign-extension. That way bitfields can rely on not having to apply mask. For the inner value of the arbitrary-int, we'd want the sign extended value though probably. Which means raw_value()
wouldn't be a pure getter bit instead a get-with-mask.
Not sure if benchmarks would help much, as it depends on your use-case:
Most likely, the difference will be incredibly tiny anyway
You're right, benchmark was the wrong word, the easier one should be used
I was thinking of trying to implement this. My gut instinct would be to "optimize" for arithmetic operations, since if the only use case is as a bitfield you can probably make it work with an unsigned type if the tiny difference is actually important? Adding masking overhead (however little) to arithmetic operations on a signed type seems like it's contrary to the point of using one, and would be a purely ergonomic change.
Awesome, happy to take a patch if it fits within the overall design. Given the previous discussions I'd suggest the following:
value
should be sign-extended (so e.g. for a i5::new(-2)
value would be -2u8
)signed_value()
returns this inner valuevalue()
returns the masked inner valueBy the way: The easiest way I can think of to mask (while retaining the sign) is to shift like this:
For example, the addition of two i5 would look something like this:
fn add(v1: i5, v2: i5) -> i5 {
u5 { value: ((v1.value + v2.value) << 3) >> 3 }
}
Hmm there is one more complication: Do we want new()
to be i5::new(-2)
or i5::new(0b11110)
?
Reason I'm asking: For unsigned ints, new()
is doing a range check. We want to have this for signed ints as well, I'd imagine.
So maybe we should make this symmetrical? i5::new_signed(-5i8)
vs i5::new(0b11110)
.
To make it less error prone, we could make new()
accept an unsigned value and have value()
return an unsigned value.
To be honest, I still don't quite like it. Seems unbalanced. What do you two think?
I took at the look at the uX crate and they don't seem to have a distinction at all, based on the documentation.
Let's think about the basic case if rust would support arbints.
i4::new(-1)
would be -1i4
.value() = -1
would be -1
At least that's what I imagined these functions to do.
I was thinking about the possibility of expanding signed to be constructed from unsigned as well, but I don't know how sold I am on it.
One problem I noticed before I got too far is the Number
trait is confined to implementing unsized types, and the corresponding macros require it. The options are making separate macros for a separate trait or modularizing it some other way.
I was thinking this should probably be behind the num-traits
feature, since that has some more general integer traits that might translate across better?
Let's think about the basic case if rust would support arbints.
i4::new(-1)
would be-1i4
.value() = -1
would be-1
At least that's what I imagined these functions to do.
I think the issue is do we represent -1i4
under the hood as 0b00001111 (15i8)
and maybe overload arithmetic ops, incurring a small amount of overhead for that, or as 0b11111111 (-1i8)
, incurring a small amount of overhead when we need to keep it constrained to 4 bits of data. And I'm inclined to say the latter.
Hmm there is one more complication: Do we want
new()
to bei5::new(-2)
ori5::new(0b11110)
?
I'm imagining i5::new(-2)
and i5::from_bits(0b11110)
. I think letting users keep a consistent conception of these as actual integral values is pretty important, so having a constructor from signed numbers is necessary imo.
in other words, I think new
should be for the underlying type in all cases, unless we were going to keep using unsigned integers under the hood, which I guess is an option? But I think the library should do the implicit conversions mathematically even in that case I think, it should be possible to use them without constantly deriving the twos complement representation yourself.
The API question can be independent from the underlying representation. We might decide to change between signed/unsigned later, but the behavior of the library should be the same.
I agree that u5::new(-3)
and value()
should return -3.
That means we should another pair of functions for the unsigned representation. from_bits()
and to_bits()
probably makes sense - it mirrors the functions in f32/f64. Those can be the masked values.
So I was thinking about the ways to go about including all of the implementations, and one thing to consider is that the Number
trait (and the crate in general) is very coupled with unsigned integers. This isn't a problem in and of itself, per se, but has to be worked around in one of a few ways:
SignedNumber
that is From<i8> + TryFrom<i16>
etc. I don't really like this approach, it's a huge amount of duplication and just more code to maintain for largely the same logic.Number
and then do a generic implementation of Number
for T where T: From<[appropriate signed int type here]>
.from_bits()
and to_bits()
.The third option would essentially represent every arbitrary int as a sign-extended byte aligned type, and the bounds checking would work largely similarly I think? Option 2 is almost certainly the sanest approach, but I think there's a certain elegance to 3.
I can just comment two more thoughts:
I mean, I think transmute
is not the proper answer, it's too heavyweight, I think we could probably get away with a #[allow(overflowing_literals)]
on exactly the functions that are moving between types of the same size.
To be clearer: if we did 3., wouldn't that mean core::mem::transmute
would fail?
Nothing makes me think it should lead to UB naively, transmute
still requires that the items have the same size, and that's the one guarantee we have. For any bit configuration smaller than the full size, the bit representation should be identical for signed and unsigned integers. I could be wrong though.
I'd prefer to not use transmute as it's unsafe. The safe alternative is simply to do a cast here: -1i8 as u8 == 255u8
. Casting between signed and unsigned of the same size won't touch the bits and there's no possibility for overflow either.
As to @vortexofdoom 's suggestions:
third: could be done in a way that's not a breaking change, but I don't see how it would help. We can change the backing store (and mostly things would be identical), but we might have to change some operations like Add to restore the previous behavior.
second: Won't work unfortunately as Rust's generics don't support const. So once we introduce something like T: From
, we lose all const-ness :(
first: I think this is the way to go unfortunately. One silver lining here is that there are currently two Number implementations (one using const_trait and one using stable). As Rust itself has dropped const_trait, we don't need to have the const-version for the SignedNumber. I'll file a separate task to remove that.
Yeah, to put what I was trying to say another way, iN
primitive types correctly represent ANY unsigned type with N-1
bits or fewer, so having fewer trait implementations, generic structs, etc. generated as a result of adding signed integers would be the big win for that idea, but it's just an idea and probably not a big deal.
As Rust itself has dropped const_trait, we don't need to have the const-version for the SignedNumber. I'll file a separate task to remove that.
You know I'm using this functionality and probably also know that they've just temporarily removed it. Please don't.
Filed #22 . Let's discuss the const-trait over there?
I wanted to make some more changes to Number anyway. For example, Number can easily require Add, Sub, Shl etc to be implemented. The downside of that though is that it would be a slightly breaking change (in the unlikely case that someone else implements Number). So it might be a good idea to bundle API changes together and make a 2.0.
Sorry, I got busy with other things but have recently been thinking about getting back to this.
One thing that originally slowed me down is that lib.rs is pretty unwieldy already, and having alternating signed/unsigned impls or two big blocks both felt pretty bad for both readability and maintainability. It feels weird to have a file just for signed types, so I'd probably break out both types into their own files, and maybe only keep the Number trait(s?) in lib.rs.
Speaking of Number, I think it would be a good idea to make it sealed if we're implementing all types from from 1 to 128 bits. There's not really a use case for users to implement it on external types, and that ability is probably not something we want to guarantee long term. Probably something to be filed in a separate issue, but it might add some implementation flexibility here.
In case someone picks this up before I do, one thought I had is that a major benefit to implementing with the underlying type being signed is that right shifts are sign-extending. This would make it pretty easy to cast between the unsigned "bits" and signed internal formats by first left shifting until the sign bit lines up with the underlying primitive and then switching to the other and right shifting again.
zig becomes more and more popular, so zigs defaults may be good to go (hoping rust will also adopt these in future). i mean storage vs arith for negative numbers. storage also for u nums is relevant.
It would be useful to implement signed types i2 .. i127