dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.26k stars 4.73k forks source link

[API Proposal]: Support General Math value creation without sign propagation #106927

Closed AlexRadch closed 1 week ago

AlexRadch commented 2 months ago

Background and motivation

I want to calculate the difference between two values as a non-negative max - start number.

TFrom start, and TFrom max define a range whose length is from 0 to int.MaxValue - 1. 0 <= length < int.MaxValue

When TFrom is wider than int, calculate the length simply as length = int.CreateTruncating(end - start), because the difference end - start is always positive.

When TFrom is narrower than int and has a sign, the difference between them can be a negative number, which is difficult to convert to a positive value. For example when TFrom is sbyte and start = -128; max = 127 then length = 128 + 127 = 255 = -1 (signed byte).

It isn't not easy to convert TFrom(-1) to positive TTo(255) int.CreateTruncating(TFrom(-1)) creates int(-1) uint.CreateTruncating(TFrom(-1)) creates uint.MaxValue. This made the sign propagation!

So I made the next workaround code.

            internal static int StartMaxCount(T start, T max)
            {
                int count = int.CreateTruncating(max - start);
                if (count < 0)
                    count = int.CreateTruncating(max) - int.CreateTruncating(start);
                return count;
            }

API Proposal

namespace System.Numerics
{
    public interface INumberBase<TSelf> where TSelf : INumberBase<TSelf>?
    {
        static TSelf CreateTruncating<TOther>(TOther value, bool signPropagation) where TOther : INumberBase<TOther>;

        protected static bool TryConvertFromTruncating<TOther>(TOther value, bool signPropagation, [MaybeNullWhen(false)] out TSelf result) where TOther : INumberBase<TOther>;

        protected static bool TryConvertToTruncating<TOther>(TSelf value, bool signPropagation, [MaybeNullWhen(false)] out TOther result)  where TOther : INumberBase<TOther>;
    }
}

API Usage

// Range where TFrom is sbyte
TFrom start = -128;
TFrom max = 127;

// Calculate the correct length
int length = int.CreateTruncating(max - start, false);

See also https://github.com/dotnet/runtime/blob/e689e2a3d8dfedfbefb17fb9ad559401c208265d/src/libraries/System.Linq/src/System/Linq/Range.cs#L30-L49

Alternative Designs

No response

Risks

No response

colejohnson66 commented 2 months ago

If you need a delta between two numbers, use INumberBase<TSelf>.Abs

AlexRadch commented 2 months ago

If you need a delta between two numbers, use INumberBase<TSelf>.Abs

I need to get TTo(255) from TFrom(-1). Abs(-1) is equal to 1 but not 255. It isn't easy to convert TFrom(-1) to positive TTo(255).

tannergooding commented 2 months ago

When TFrom is wider than int, calculate the length simply as length = int.CreateTruncating(end - start), because the difference end - start is always positive.

This isn't valid given an arbitrary TFrom. It is also dependent on other factors, such as knowing end >= start and both being positive. Say, for example, TFrom is long and start == 0 and end == 21247483648, then end - start == 2147483648 and int.CreateTruncating(end - start) is -2,147,483,648

Given TFrom and end > start where both are positive, then end - start always produces a positive TFrom. Such a result may or may not fit into int, depending on if the type can represent values greater than int.MaxValue or not.

If all values are within range of int.MaxValue, then it can be represented. Otherwise it can truncate and lose information, potentially even becoming negative. You would use CreateSaturating to ensure it normalizes to int.MaxValue or CreateChecked to ensure it throws for out of range values if that was an important semantic and you needed to produce an int result.

tannergooding commented 2 months ago

Can you provide a more concrete example and explicitly lay out the constraints/assumptions of end and start?

What you'd want to do if start/end can be negative or if end < start can be true is different than if they are restricted to be positive or if you know that end >= start must always be true.

It also differs based on whether TFrom can be any type, whether it can only be types smaller or larger than int

AlexRadch commented 2 months ago

It also differs based on whether TFrom can be any type, whether it can only be types smaller or larger than int

@tannergooding

Yes TFrom can be any type, smaller or larger than int. When TFrom is larger int — all is okey. When TFrom is smaller int and does not have sign (byte or ushort) — all is okey too.

When TFrom is smaller int and and have sign (sbyte or short), then end - start can be negative number when start < end.

For example when TFrom is sbyte, start = -128; end = 127 then len = 127 + 128 = -1; So I need to convert sbyte(-1) to int(255).

tannergooding commented 2 months ago

When TFrom is larger int — all is okey.

This doesn't sound correct, as you could have a negative result produced for the scenario I gave, such as start = 0, end = 2_147_483_648 (which gives int.CreateTruncating(2_147_483_648), which is -2_147_483_648) or start = 0, end = 4_294_967_295 (which gives int.CreateTruncating(4_294_967_295), which is-1`).

The only case it would be safe is if you knew that end - start must produce a value within the range [0, int.MaxValue] and therefore such cases are impossible to encounter (in which case you should likely have some validation or assertion that is true to avoid bugs).

For example when TFrom is sbyte, start = -128; end = 127 then len = 127 + 128 = -1; So I need to convert sbyte(-1) to int(255).

When you have negatives covering the full range like this, there isn't really any fix you can do. You have lost information the moment you did 127 - (-128) and it became -1, when it should've been 255.

You can't use the Read/Write APIs like you suggested in the top post because there isn't any guarantee that TFrom wrote the value in a bit pattern that translates exactly to the equivalent unsigned bit pattern. For example, it would be legal to define some MyCustomType that was identical to sbyte in every way, but where it always serialized using 4-bytes in WriteLittleEndian, in which case it would write 0xFFFF_FFFF for -1, not 0xFF, and thus would become -1 when read as an int, not 255.

For types with a representable range smaller than int, you could upcast to int to ensure that it is always correct. But you would inversely need to upcast to say long if TFrom was int itself because int.MaxValue - int.MinValue is similarly unrepresentable by int. So the only real solution is to take another generic representing the result type and to recommend that the representable range be at least twice that of the input type.

AlexRadch commented 2 months ago

The only case it would be safe is if you knew that end - start must produce a value within the range [0, int.MaxValue] and therefore such cases are impossible to encounter (in which case you should likely have some validation or assertion that is true to avoid bugs).

Yes end - start is in the range [0, int.MaxValue]. I mentioned this.

For types with a representable range smaller than int, you could upcast to int to ensure that it is always correct. But you would inversely need to upcast to say long if TFrom was int itself because int.MaxValue - int.MinValue is similarly unrepresentable by int. So the only real solution is to take another generic representing the result type and to recommend that the representable range be at least twice that of the input type.

I don't know in advance when TFrom is greater or less than TTo. And the available methods do not allow you to know this at runtime.

You can't use the Read/Write APIs like you suggested in the top post because there isn't any guarantee that TFrom wrote the value in a bit pattern that translates exactly to the equivalent unsigned bit pattern. For example, it would be legal to define some MyCustomType that was identical to sbyte in every way, but where it always serialized using 4-bytes in WriteLittleEndian, in which case it would write 0xFFFF_FFFF for -1, not 0xFF, and thus would become -1 when read as an int, not 255.

It works for sbytes, and short. But it can not work for custom types.

AlexRadch commented 2 months ago

You can't use the Read/Write APIs like you suggested in the top post because there isn't any guarantee that TFrom wrote the value in a bit pattern that translates exactly to the equivalent unsigned bit pattern. For example, it would be legal to define some MyCustomType that was identical to sbyte in every way, but where it always serialized using 4-bytes in WriteLittleEndian, in which case it would write 0xFFFF_FFFF for -1, not 0xFF, and thus would become -1 when read as an int, not 255.

I propose to expand the API for this case so that it always works.

tannergooding commented 2 months ago

It works for sbytes, and short. But it can not work for custom types.

Which means its not an API we can provide in box.

I propose to expand the API for this case so that it always works.

This would be a breaking change and is not something we're interested in doing.

I don't know in advance when TFrom is greater or less than TTo. And the available methods do not allow you to know this at runtime.

If you can guarantee that end - start is in the range 0, int.MaxValue always, then a simple way to handle it is something like:

length = int.CreateTruncating(end - start);

if (length < 0)
{
    length = int.CreateTruncating(end) - int.CreateTruncating(start);
}
AlexRadch commented 2 months ago

I propose to expand the API for this case so that it always works.

This would be a breaking change and is not something we're interested in doing.

No breaking change in my suggestion.

tannergooding commented 2 months ago

The breaking change is that we cannot provide such a DIM without requiring some behavioral breaking change due to the fact that the output of WriteLittleEndian for a signed value is not guaranteed to represent the unsigned two's complement value of the same number of bits as the input type when interpreted as unsigned .

Such as for MyCustomType being 8-bits but opting to serialize using 32-bits.

There's no API we could provide here because there is no way to do the conversion and have it always behave as you're intending.

AlexRadch commented 2 months ago

@tannergooding I didn't look at how CreateChecked, CreateChecked, and CreateTruncating are implemented. I somehow doubt that they are implemented through writing and reading into the buffer. Perhaps I'm wrong.

tannergooding commented 2 months ago

There are two protected abstract methods TryConvertFrom* and TryConvertTo* for each of the Create* methods. These are used in a specific pattern so that conversions can work regardless of how the layering exists (first preferring a conversion defined by TSelf and then the conversion defined by TOther if none exists).

This ensures that some concrete type is always defining the correct conversion behavior, because it at least understands its own definition and that is enough to ensure correct deterministic behavior in most scenarios.

The number types were designed to work by understanding the exact represented value of a T, regardless of what two's complement representation it serialized to. They were not designed to allow reinterpreting the raw bits to a different type as that's something that only becomes sensible if you know the concrete type.

AlexRadch commented 2 months ago

@tannergooding As far as I understood your explanation, the proposed API can be implemented. I'm not suggesting using serialization.

tannergooding commented 2 months ago

You're going to have to give the code you believe will work here.

I do not see a way that a CreateTruncatingWithoutSign API could not be provided because, we cannot provide a valid DIM that ensures correct behavior for any potential type.

AlexRadch commented 1 month ago

You're going to have to give the code you believe will work here.

@tannergooding I wrote an example of implementing the new API, for the sbyte and int types. #108037 It is not optimized for production but shows the concept. I also wrote a test that uses the new API.

            Assert.Equal(unchecked((int)0xFFFFFF80), NumberBaseHelper<int>.CreateTruncating<sbyte>(unchecked((sbyte)0x80), true));
            Assert.Equal(unchecked((int)0xFFFFFFFF), NumberBaseHelper<int>.CreateTruncating<sbyte>(unchecked((sbyte)0xFF), true));

            Assert.Equal(unchecked((int)0x80), NumberBaseHelper<int>.CreateTruncating<sbyte>(unchecked((sbyte)0x80), false));
            Assert.Equal(unchecked((int)0xFF), NumberBaseHelper<int>.CreateTruncating<sbyte>(unchecked((sbyte)0xFF), false));

https://github.com/dotnet/runtime/blob/47c68c6974c3639add6c67cd2b54ecf9305480a7/src/libraries/System.Runtime/tests/System.Runtime.Tests/System/Int32Tests.GenericMath.cs#L1980-L1999

tannergooding commented 1 month ago

@AlexRadch, I think you misunderstood.

I meant a generic DIM implementation, not a per type specialized implementation. This would require a DIM and therefore a valid default implementation would need to be provided, which I don't believe is possible here.

AlexRadch commented 1 month ago

I meant a generic DIM implementation, not a per type specialized implementation. This would require a DIM and therefore a valid default implementation would need to be provided, which I don't believe is possible here.

I don't understand why we need to make a generic DIM implementation of the new TryConvertToTruncating, TryConvertFromTruncating methods that work for all types?

tannergooding commented 1 month ago

You cannot expose new methods on existing interfaces without giving them a default implementation, that is a breaking change otherwise.

The TryConvertTo/From* APIs are protected helpers used by the public Create* to help give a deterministic order to the create APIs to help ensure that we don't get into infinite cycles, ambiguities, or all the other problems that can come about for near arbitrary conversions between types.

It is likewise not sufficient to introduce APIs which fail for all types by default. This is a pit of failure for users.

The behavior you're wanting here rather seems something that is specific to two's complement representation and bit-blitting of data between types, not something that is truly general purpose and works for all numeric types. I do not think it is a good fit for the generic APIs.

AlexRadch commented 1 month ago

I wrote a default implementation, otherwise nothing would have compiled for me.

https://github.com/dotnet/runtime/blob/47c68c6974c3639add6c67cd2b54ecf9305480a7/src/libraries/System.Private.CoreLib/src/System/Numerics/INumberBase.cs#L411-L421

https://github.com/dotnet/runtime/blob/47c68c6974c3639add6c67cd2b54ecf9305480a7/src/libraries/System.Private.CoreLib/src/System/Numerics/INumberBase.cs#L460-L470

tannergooding commented 1 month ago

That implementation fails for common inputs and will result in the public TOther CreateTruncating<TOther>(TSelf value, bool signPropagation) API throwing NotSupportedException.

For example, uint.CreateTruncating<int>(-1, false) would first try TryConvertFromTruncating and would see that signPropagation is false and value is negative, so it would return false. It would then try TryConvertToTruncating and see the same. Thus, the default implementation fails for one of the "common" expected use-cases; rather than being a correct but likely slower implementation.

The expectation is that by default:

This is what I believe cannot be trivially achieved through a DIM and which is largely relying on very particular semantics of two's complement fixed-width primitive integer types, not something which is more general-purpose/generic in nature and which is a good fit for arbitrary number types.

AlexRadch commented 1 month ago

For base types, the new TryConvert methods will be implemented by the base types themselves. For the sbyte and int types I gave an example of such an implementation.

After all, even now there is no correct general slow implementation of the old TryConvert methods — base types implement them.

I don't understand why the old TryConvert methods should not have the correct common slow implementation for all types. Still, the new TryConvert methods should have the correct common slow implementation for all types.

Where does this demand come from?

tannergooding commented 1 month ago

After all, even now there is no correct general slow implementation of the old TryConvert methods — base types implement them.

These were introduced day 1 and do not have DIMs, it is required for all types to implement them so there is no risk of existing types not supporting such an API.

Now that the types have shipped, new members must have DIMs to avoid being a binary breaking change. That DIM needs to be sensible and function as users would roughly expect across all types. A given type can then explicitly provide their own more efficient implementation, but that should namely be for performance or precision not for general correctness.

AlexRadch commented 1 month ago

Ok. I think I can use INumberBase<TSelf>.Radix property and make default slow implementation. I will try.

dotnet-policy-service[bot] commented 1 month ago

This issue has been marked needs-author-action and may be missing some important information.

dotnet-policy-service[bot] commented 3 weeks ago

This issue has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

dotnet-policy-service[bot] commented 1 week ago

This issue will now be closed since it had been marked no-recent-activity but received no further activity in the past 14 days. It is still possible to reopen or comment on the issue, but please note that the issue will be locked if it remains inactive for another 30 days.