dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.28k stars 4.73k forks source link

`System.Double.IsRealNumber(System.Double.PositiveInfinity)` is true #103930

Closed Smaug123 closed 3 weeks ago

Smaug123 commented 4 months ago

Description

This seems so deeply unintuitive that I can only imagine it to be a bug.

To be concrete, this function is documented as follows: "Determines if a value represents a real number." I am not aware of any formulations of the reals in which $\infty$ is a real number. (Certainly "Dedekind cuts of the rationals" and "Cauchy completions of rational sequences", as well as Tarski's axiomatization, all do not admit infinite reals.)

Reproduction Steps

$ dotnet fsi
> System.Double.IsRealNumber(System.Double.PositiveInfinity);;

Expected behavior

$ dotnet fsi
> System.Double.IsRealNumber(System.Double.PositiveInfinity);;
val it: bool = false

Actual behavior

$ dotnet fsi
> System.Double.IsRealNumber(System.Double.PositiveInfinity);;
val it: bool = true

Regression?

It's always been this way, ever since it was introduced in https://github.com/dotnet/runtime/pull/69651 .

Known Workarounds

I believe IsFinite is a correct implementation of IsRealNumber.

Configuration

SDK 8.0.301.

Other information

No response

dotnet-policy-service[bot] commented 4 months ago

Tagging subscribers to this area: @dotnet/area-system-numerics See info in area-owners.md if you want to be subscribed.

Smaug123 commented 4 months ago

A friend suggested the following interpretation:

I could buy the argument that positive infinity is just a really big real number for the purposes of this API

This can't be it:

> System.Double.PositiveInfinity * 0.0;;
val it: float = nan

No real number has that property either.

tannergooding commented 4 months ago

No real number has that property either.

Computer math in general doesn't hold the properties that you might expect to exist in "real math". You can't guarantee invariants like (a + b) + c == a + (b + c) or that (a + 1) > a and so on. Computer math is a rough approximation of real math that specifically makes allowances for needing to operate in a constrained finite domain and needing to query information programmatically to determine optimal handling or control flow.


This behavior is very intentional and most closely mirrors what a programmer working with a Complex number type represented by a real-part: a and an imaginary-part b needs to think about and consider. In such scenarios, you have a + bi where b being zero indicates the value is intended to be interpreted at "real", where a being zero indicates the value is intended to be interpreted as "imaginary", and where both being non-zero indicates the value is intended to be interpreted as "complex". There is then special allowance for NaN which indicates a value cannot be represented and therefore also disqualifies the value as purely real or imaginary. -- This latter aspect is why float.Sqrt(-1) produces NaN, an indicator it requires special handling and additional dimensions to represent the value, thus needing you to compute an imaginary portion. There then exist even higher level concepts like quaternions which are higher dimensioned than complex numbers and can represent additional state that a complex number could not.

Such considerations are also why 0 is considered positive and why -0 is considered negative as it's required for working with values that may round to the nearest representable value as part of normal computation and where there exists a parity of signs that matters from such a programmatic point of view. It further follows into the general considerations made by the IEEE 754 specification around handling of values such that they can be used to correctly build complex number types and maps to the same considerations that C and C++ have historically made for their floating-point and complex number support. C for example explicitly calls this out in G.5 (but also expands on the general topic in depth and the considerations required around real infinities, NaNs, signed zeros, subnormals, etc):

For most operand types, the value of the result of a binary operator with an imaginary or complex operand is completely determined, with reference to real arithmetic, by the usual mathematical formula. For some operand types, the usual mathematical formula is problematic because of its treatment of infinities and because of undue overflow or underflow; in these cases the result satisfies certain properties (specified in G.5.1), but is not completely determined.

Smaug123 commented 4 months ago

(I am indeed familiar with the various number systems available in mathematics, and I have a working understanding of IEEE 754.)

The name IsRealNumber is extremely explicit and I claim it is in fact unambiguous. I do understand what it's meant to mean ("is fixed by the inclusion 'take the real part'"), but that name simply doesn't describe that operation!

I don't have access to the latest IEEE 754 spec, but I can access the 2008 revision, which is explicit that representations of complex numbers are unspecified by the standard. Since the standard which governs floats only mentions complex numbers once (to say they are not specified), it seems reasonable to expect that normal terminology would apply to .NET. I, a simple man, see a function called IsRealNumber and think "oh, that must tell me if this float represents a real number". If IsRealNumber intended to say HasNonrealComponent or IsExtendedRealNumber (to use IEEE 754's terminology; IEEE 754 is explicit about the number system it implements!) then surely it would have been named that, I think to myself, writing a bug that fortunately my tests catch.


Relevant parts of IEEE 754 are clearly contradicted by all implementations, which doesn't help:

The behavior of infinity in floating-point arithmetic is derived from the limiting cases of real arithmetic with operands of arbitrarily large magnitude, when such a limit exists

This is clearly false, since 0 * inf = nan in all implementations I know of.


This could be fixed with a docstring along the lines of "This function is equivalent to not(IsNan). To determine whether a float represents a finite real number, use IsFinite.

tannergooding commented 4 months ago

The name IsRealNumber is extremely explicit and I claim it is in fact unambiguous

I would disagree. There exists, for example, the extended real number line which is all real numbers + positive and negative infinity, allowing them to be treated as actual numbers: https://en.wikipedia.org/wiki/Extended_real_number_line

This then follows explicitly from the IEEE 754 spec, 3.2:

The mathematical structure underpinning the arithmetic in this standard is the extended reals, that is, the set of real numbers together with positive and negative infinity.

This is called out precisely twice in the spec and is otherwise simply discussed in terms of "real numbers" and "real arithmetic". The C language specification does much the same, as do other language and hardware specifications. That is, they all discuss things in terms of "real" and typically only mention the concept of "extended real" once, if at all.


This is clearly false, since 0 * inf = nan in all implementations I know of.

This is quoting part of the spec without actually taking into account the entire spec, its intents, etc. The subsequent portion explicitly calls out that operations are usually exact but that there are exceptions such as: multiplication(∞, x) or multiplication(x, ∞) for finite or infinite x != 0. This is further described in 7.2 Invalid operation where it explicitly calls out multiplication between infinity and zero.

The IEEE 754 specification takes into account the combination of real mathematics that apply to a broad range of domains in conjunction with what makes sense for programming. They therefore balance the considerations such as +inf/-inf representing actual infinities and them representing potentially large but finite values which overflowed due to limits of the underlying format. You see similar considerations with +0/-0 and how they are simultaneously treated as zero and a very small non-zero result that may have rounded towards zero due to precision limitations.

This is why you end up with cases where 0/0 produces NaN but 1/0 producing +Inf, it is why you end up with inf * 0 producing NaN despite most other handling being "sensible".


which is explicit that representations of complex numbers are unspecified by the standard

Explicit representation is different from the general behavioral requirements which are explicitly detailed and where the spec (both 2008 and 2019) detail something along the lines of:

That is not meant to imply whether those exceptions are signaled by operations not specified by this standard such as complex arithmetic or certain transcendental functions. Those and other operations, not specified by this standard, should signal those exceptions according to the definitions below for standard operations, but that might not always be economical. Standard exceptions for nonstandard functions are language-defined.

Which is to say while it is not precisely described, there is a general spirit/intent to the behavior that is clear and has been accordingly standardized by the C language specification for 20+ years at this point.


This could be fixed with a docstring along the lines of "This function is equivalent to not(IsNan). To determine whether a float represents a finite real number, use IsFinite.

This is not a correct statement and would not be sufficient, particularly as it applies to System.Numerics.Complex. The remarks section of this API already tries to clarify this more precisely via:

This function returns true for a complex number a + bi where b is zero.

As do other similar APIs (such as IsPositive) which may not fit a more typical mathematical model for explicit reasons.

The names and general terminologies used here are fairly standardized across the industry, they are not unique or specific to .NET. We are consistent with how languages speak about these concepts, with how they get exposed, and with the naming that follows most suitably to the overarching concepts that developers need to think about when working with such types. Programming in general requires many preconceived notations of mathematics to be set aside

-- Python, which is extensively used by data scientists, mathematicians, and other people without a more typical development background is a notably example that does essentially the same thing: https://numpy.org/doc/stable/reference/generated/numpy.isreal.html (math.inf+0j returns true)

tannergooding commented 4 months ago

I understand and sympathize with your confusion/frustration here. It is something I'd be happy to take a documentation improvement around if accurate wording can be determined that covers the consideration for the various considerations at play (integers, floating-point, and complex numbers).

It is not, however, something that is incorrect or overly problematic and is inline with the general industry trends (many of which have been the standard for 10-20 or more years). It is simply a case where programmers need to learn the basics of how they types they're working with behave and the various edge cases they may need to consider when working with the types more generically.

Smaug123 commented 4 months ago

(I've intensively studied the foundations of mathematics, by the way - I am familiar with a great many number systems and their constructions.)

The trouble is that this is one of the many areas of dotnet where you need an extremely comprehensive knowledge of a whole bunch of standard library if you want to use functions safely. We've talked before about how you can't safely handle strings without reading Best Practices for Strings, for example; here, the problem is that I need to know that there is an entire inheritance structure pouring down from Complex to use perfectly ordinary floats, and it's impossible to understand what would otherwise be a perfectly standard IEEE API without that context. When calling IsRealNumber on a complex structure built from floats, it's more clear that it's doing something ambiguously specified by its name. I still claim there is only one reasonable interpretation of that name as a function float to bool, and the vital context "this is intended to be meaningful for complex numbers, and the implementation just happens to have been back-ported to Float, creating a different possible interpretation" is barely hinted at in the docs; it certainly doesn't appear on the docstring. Precision is important, and it is in general possible to do better than Numpy in terminology (numpy clearly optimises for function name length, for example, where dotnet does not!). Of course the naming ship has sailed now, but there is a reason the types are called Single and Double rather than Real: they're not reals!

My complaint about statements in the standard was definitely off topic; sorry.

I still believe my suggestion of a doc phrasing is appropriate for System.Double.

tannergooding commented 4 months ago

I need to know that there is an entire inheritance structure pouring down from Complex to use perfectly ordinary floats

None of this is unique to the inheritance hierarchy, it applies broadly to the general handling of IEEE 754 floating-point and their spec'd design. It is why there exists NaN, +/-Inf, +/-0, subnormals, and in general a representation where precision/general distribution of values exists on an exponential curve.

I still claim there is only one reasonable interpretation of that name as a function float to bool

You can make this claim if you'd like, but many different programming languages have disagreed with that premise. It is a fairly standard practice to do exactly what .NET has done here.

and the implementation just happens to have been back-ported to Float

This is not the case, the value remains meaningful to IEEE 754 floating-point as well. It is specifically exposed because all values except NaN are representative of real numbers (including +/-inf, as per the spec) and NaN itself is indicative of a value that cannot be represented in the real number domain and therefore requires extra handling (typically representing a complex number).

reason the types are called Single and Double rather than Real: they're not reals!

This is more a quirk of the naming decided on 20 years ago and themselves are not great names. Many programming languages do in fact call them reals, some use the name r32/64 or real32/64. Some use the name exclusively as f32/64 or float32/64, others use the name single, double, others use float in there, etc. There is a broad range of naming used across the industry and almost all major languages consistently use the terminology real to describe what these types support (including the C# language specification which defines these + decimal as real types, real literals, and explicitly documents that the latter cannot be used to define infinity today).

I still believe my suggestion of a doc phrasing is appropriate for System.Double.

The documentation for double.IsRealNumber needs to remain overall consistent with the documentation for INumberBase<T>.IsRealNumber and needs to remain robust to the general considerations necessary for all number types. It can potentially have additional remarks or minor wording differences if that is extremely relevant, but the ideal is that they are consistent here.

Smaug123 commented 4 months ago

I do stand corrected on my claim that "there is only one reasonable interpretation": I did a straw poll (with answers public as they were made) at my company in as non-leading terms as I could manage ("Straw poll: without looking, what is System.Double.IsRealNumber(System.Double.PositiveInfinity)?" with two options, True and False), and answers are 50/50 split (five votes for each on that poll, with several people additionally expressing uncertainty; people I asked outside the poll are 2/1 in favour of "false"). It's clear that I was wrong and the answer is not obvious. It also seems clear that, since this is not addressed at all in the documentation (and preferably the docstring), it really should be.

tannergooding commented 4 months ago

Doc contributions are welcome, just keep in mind you'll want to update Complex, Double, Half, Single, NFloat, and INumberBase<T> to be consistent (with the considerations I gave above as to ensuring its well understood across all these types)

jeffhandley commented 3 months ago

I appreciate the discussion that led to the conclusions here that we should update documentation to cite the edge cases for this API. I've marked this issue as https://github.com/dotnet/runtime/labels/help%20wanted.

Changing the API documentation would involve a pull request at dotnet/dotnet-api-docs. If there are conceptual documentation pages to update, that would be through a pull request at dotnet/docs.