danlehmann / arbitrary-int

A modern and lightweight implementation of arbitrary integers for Rust
MIT License
31 stars 12 forks source link

Is there any plan to implement signed types i2 .. i127 #14

Open apps4uco opened 1 year ago

apps4uco commented 1 year ago

It would be useful to implement signed types i2 .. i127

danlehmann commented 1 year ago

Thanks for the bug report. I'm planning on doing them eventually, though I haven't had the need yet.

There are also some non-trivial design questions. For example, should i4::new(-1).raw_value() return -1i8 or 0b1111? The former would be more useful for arithmetics where-as the second would be more useful to use in bitfields.

Maybe the answer is two functions? raw_value() could return 0b1111 and .value() (or .into()) would return -1i8.

hecatia-elegua commented 1 year ago

I would go for two functions too (but the inner value will still need to be one of both, sounds like something we could maybe benchmark or just use the easier one). Hm, though -1i8 as u8 == 0b11111111... u4::from(0b1111)... honestly would need some prototyping together with bitfields

danlehmann commented 1 year ago

It would be nice if .raw_value() would just be the bits, without sign-extension. That way bitfields can rely on not having to apply mask. For the inner value of the arbitrary-int, we'd want the sign extended value though probably. Which means raw_value() wouldn't be a pure getter bit instead a get-with-mask.

danlehmann commented 1 year ago

Not sure if benchmarks would help much, as it depends on your use-case:

Most likely, the difference will be incredibly tiny anyway

hecatia-elegua commented 1 year ago

You're right, benchmark was the wrong word, the easier one should be used

vortexofdoom commented 1 year ago

I was thinking of trying to implement this. My gut instinct would be to "optimize" for arithmetic operations, since if the only use case is as a bitfield you can probably make it work with an unsigned type if the tiny difference is actually important? Adding masking overhead (however little) to arithmetic operations on a signed type seems like it's contrary to the point of using one, and would be a purely ergonomic change.

danlehmann commented 1 year ago

Awesome, happy to take a patch if it fits within the overall design. Given the previous discussions I'd suggest the following:

By the way: The easiest way I can think of to mask (while retaining the sign) is to shift like this:

For example, the addition of two i5 would look something like this:

fn add(v1: i5, v2: i5) -> i5 {
  u5 { value: ((v1.value + v2.value) << 3) >> 3 }
}
danlehmann commented 1 year ago

Hmm there is one more complication: Do we want new() to be i5::new(-2) or i5::new(0b11110)?

Reason I'm asking: For unsigned ints, new() is doing a range check. We want to have this for signed ints as well, I'd imagine.

So maybe we should make this symmetrical? i5::new_signed(-5i8) vs i5::new(0b11110).

To make it less error prone, we could make new() accept an unsigned value and have value() return an unsigned value.

danlehmann commented 1 year ago

To be honest, I still don't quite like it. Seems unbalanced. What do you two think?

danlehmann commented 1 year ago

I took at the look at the uX crate and they don't seem to have a distinction at all, based on the documentation.

hecatia-elegua commented 1 year ago

Let's think about the basic case if rust would support arbints. i4::new(-1) would be -1i4 .value() = -1 would be -1 At least that's what I imagined these functions to do.

vortexofdoom commented 1 year ago

I was thinking about the possibility of expanding signed to be constructed from unsigned as well, but I don't know how sold I am on it.

One problem I noticed before I got too far is the Number trait is confined to implementing unsized types, and the corresponding macros require it. The options are making separate macros for a separate trait or modularizing it some other way.

I was thinking this should probably be behind the num-traits feature, since that has some more general integer traits that might translate across better?

vortexofdoom commented 1 year ago

Let's think about the basic case if rust would support arbints. i4::new(-1) would be -1i4 .value() = -1 would be -1 At least that's what I imagined these functions to do.

I think the issue is do we represent -1i4 under the hood as 0b00001111 (15i8) and maybe overload arithmetic ops, incurring a small amount of overhead for that, or as 0b11111111 (-1i8), incurring a small amount of overhead when we need to keep it constrained to 4 bits of data. And I'm inclined to say the latter.

vortexofdoom commented 1 year ago

Hmm there is one more complication: Do we want new() to be i5::new(-2) or i5::new(0b11110)?

I'm imagining i5::new(-2) and i5::from_bits(0b11110). I think letting users keep a consistent conception of these as actual integral values is pretty important, so having a constructor from signed numbers is necessary imo.

in other words, I think new should be for the underlying type in all cases, unless we were going to keep using unsigned integers under the hood, which I guess is an option? But I think the library should do the implicit conversions mathematically even in that case I think, it should be possible to use them without constantly deriving the twos complement representation yourself.

danlehmann commented 1 year ago

The API question can be independent from the underlying representation. We might decide to change between signed/unsigned later, but the behavior of the library should be the same.

I agree that u5::new(-3) and value() should return -3.

That means we should another pair of functions for the unsigned representation. from_bits() and to_bits() probably makes sense - it mirrors the functions in f32/f64. Those can be the masked values.

vortexofdoom commented 1 year ago

So I was thinking about the ways to go about including all of the implementations, and one thing to consider is that the Number trait (and the crate in general) is very coupled with unsigned integers. This isn't a problem in and of itself, per se, but has to be worked around in one of a few ways:

  1. Just define a new trait like SignedNumber that is From<i8> + TryFrom<i16> etc. I don't really like this approach, it's a huge amount of duplication and just more code to maintain for largely the same logic.
  2. Remove the trait bounds from Number and then do a generic implementation of Number for T where T: From<[appropriate signed int type here]>.
  3. An intriguing idea would be to have every arbitrary int type backed by a signed int. The only bit widths of unsigned int where it matters whether its representation is signed are the actual primitive types, which the crate obviously doesn't provide. Any number of bits lower than the native type's will be guaranteed to have the same value, and overflows should work exactly the same. This could obviously be a breaking change, although it might(?) actually be possible to keep the same API, and for most use cases I think most people will be using the aliased type directly, rather than accessing the underlying type other than as raw bits, which should still be supported as an unsigned int with from_bits() and to_bits().

The third option would essentially represent every arbitrary int as a sign-extended byte aligned type, and the bounds checking would work largely similarly I think? Option 2 is almost certainly the sanest approach, but I think there's a certain elegance to 3.

hecatia-elegua commented 1 year ago

I can just comment two more thoughts:

vortexofdoom commented 1 year ago

I mean, I think transmute is not the proper answer, it's too heavyweight, I think we could probably get away with a #[allow(overflowing_literals)] on exactly the functions that are moving between types of the same size.

hecatia-elegua commented 1 year ago

To be clearer: if we did 3., wouldn't that mean core::mem::transmute would fail?

vortexofdoom commented 1 year ago

Nothing makes me think it should lead to UB naively, transmute still requires that the items have the same size, and that's the one guarantee we have. For any bit configuration smaller than the full size, the bit representation should be identical for signed and unsigned integers. I could be wrong though.

danlehmann commented 1 year ago

I'd prefer to not use transmute as it's unsafe. The safe alternative is simply to do a cast here: -1i8 as u8 == 255u8. Casting between signed and unsigned of the same size won't touch the bits and there's no possibility for overflow either.

danlehmann commented 1 year ago

As to @vortexofdoom 's suggestions:

third: could be done in a way that's not a breaking change, but I don't see how it would help. We can change the backing store (and mostly things would be identical), but we might have to change some operations like Add to restore the previous behavior.

second: Won't work unfortunately as Rust's generics don't support const. So once we introduce something like T: From, we lose all const-ness :(

first: I think this is the way to go unfortunately. One silver lining here is that there are currently two Number implementations (one using const_trait and one using stable). As Rust itself has dropped const_trait, we don't need to have the const-version for the SignedNumber. I'll file a separate task to remove that.

vortexofdoom commented 1 year ago

Yeah, to put what I was trying to say another way, iN primitive types correctly represent ANY unsigned type with N-1 bits or fewer, so having fewer trait implementations, generic structs, etc. generated as a result of adding signed integers would be the big win for that idea, but it's just an idea and probably not a big deal.

hecatia-elegua commented 1 year ago

As Rust itself has dropped const_trait, we don't need to have the const-version for the SignedNumber. I'll file a separate task to remove that.

You know I'm using this functionality and probably also know that they've just temporarily removed it. Please don't.

danlehmann commented 1 year ago

Filed #22 . Let's discuss the const-trait over there?

danlehmann commented 1 year ago

I wanted to make some more changes to Number anyway. For example, Number can easily require Add, Sub, Shl etc to be implemented. The downside of that though is that it would be a slightly breaking change (in the unlikely case that someone else implements Number). So it might be a good idea to bundle API changes together and make a 2.0.

vortexofdoom commented 7 months ago

Sorry, I got busy with other things but have recently been thinking about getting back to this.

One thing that originally slowed me down is that lib.rs is pretty unwieldy already, and having alternating signed/unsigned impls or two big blocks both felt pretty bad for both readability and maintainability. It feels weird to have a file just for signed types, so I'd probably break out both types into their own files, and maybe only keep the Number trait(s?) in lib.rs.

Speaking of Number, I think it would be a good idea to make it sealed if we're implementing all types from from 1 to 128 bits. There's not really a use case for users to implement it on external types, and that ability is probably not something we want to guarantee long term. Probably something to be filed in a separate issue, but it might add some implementation flexibility here.

In case someone picks this up before I do, one thought I had is that a major benefit to implementing with the underlying type being signed is that right shifts are sign-extending. This would make it pretty easy to cast between the unsigned "bits" and signed internal formats by first left shifting until the sign bit lines up with the underlying primitive and then switching to the other and right shifting again.

dzmitry-lahoda commented 2 months ago

zig becomes more and more popular, so zigs defaults may be good to go (hoping rust will also adopt these in future). i mean storage vs arith for negative numbers. storage also for u nums is relevant.