chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.8k stars 421 forks source link

Math module - rename INFINITY, NAN #19016

Closed lydia-duncan closed 1 year ago

lydia-duncan commented 2 years ago

We are trying not to use all caps for naming. This issue proposes renaming both of these to all lower case versions, e.g. infinity and nan.

lydia-duncan commented 2 years ago

We discuss acronyms in camelCase over on #6698. The adjustment for NAN should take the result of that into account

mppf commented 2 years ago

inf or infinity ?

damianmoz commented 2 years ago

I was about to ask the same question as Michael.

Please remember that NAN is an abbreviation for Not-a-Number. So if you want to change it, make it either NaN or nan. If making it nan means your style guide incorrectly mandates isNan, then it should not be the latter but the former, i.e. NaN. See also my comments in #19019 which should have been part of this same issue (as should #19017).

lydia-duncan commented 2 years ago

inf or infinity ?

My proposal was infinity, but I think there's a good argument it should be consistent with whatever we decide for #19017

bradcray commented 2 years ago

I like inf or infinity. I'm finding I prefer NaN over nan just because it makes my eye recognize it more ("nan" reads like it might be a word or name to me). It would be a deviation against our style, and I'm not dead-set against nan, but that's my gut reaction.

This is a case where I'd be very curious to know what other modern languages are choosing.

lydia-duncan commented 2 years ago

So I think there's not necessarily a strong precedent and we are free to choose what we like, to a reasonable extent.

bradcray commented 2 years ago

On my Chapel deserted island, I'd probably choose NaN and inf, but if a group more numerous or experienced than I advocated for nan and/or infinity instead of either of those, I wouldn't fight them. Mostly, I'd like to avoid NAN, INF[INITY], and Inf[inity] which feel too capitalized for Chapel constants/params to me.

I'm curious what @damianmoz would do on his desert island under those constraints (and/or what he has done in his personal modules).

Something I was curious about and happy to verify is that even symbols from an auto-used module can be renamed on use/import which means that users who didn't like our choices could deviate from them.

Also interesting: Even though our symbol is called INFINITY today, it prints out as inf, which suggests to me that that's perhaps a natural name for it. NaN prints out as nan which suggests that I should either back off of NaN or we should change the way it's printed (I have no idea how strong a precedent there is for printing it one way or the other; ditto for inf, I guess).

lydia-duncan commented 2 years ago

Something I was curious about and happy to verify is that even symbols from an auto-used module can be renamed on use/import which means that users who didn't like our choices could deviate from them.

But that only enables the new name, right? It wouldn't prevent anyone in the same scope from using the old name accidentally?

Also interesting: Even though our symbol is called INFINITY today, it prints out as inf, which suggests to me that that's perhaps a natural name for it. NaN prints out as nan which suggests that I should either back off of NaN or we should change the way it's printed (I have no idea how strong a precedent there is for printing it one way or the other; ditto for inf, I guess).

Huh. I think the output when printing one of the symbols should match its naming in the modules. I wonder why it doesn't match?

bradcray commented 2 years ago

But that only enables the new name, right? It wouldn't prevent anyone in the same scope from using the old name accidentally?

I think that's true, but one could potentially do an import Mod.{sym as _}; to block it if they wanted to not only enable their preferred behavior, but disable the language's?

I wonder why it doesn't match?

I suspect we were following C's lead in both regards: https://tio.run/##S9ZNT07@/185My85pzQlVcGmuCQlM18vw44LIZSbWJIBEuEqy89MUchNzMzT0FSo5lJQKCjKzCtJ01BSTYvJU9JR8HP007TGFPb0c/P08wyJBMrV/v8PAA

damianmoz commented 2 years ago

If you use NaN or NAN into the text of a document which talks about this topic both are readable, with NaN being slightly more readable because it looks a bit like a number such as 7.7. I personally use an expression like x != to test whether a number is a NaN, or 99% of the time explicitly write

mynan = (x - x) / (x - x);

to generate my own NaN and trigger an IEEE 754 except at same time. You would only use it in an assignment

mynan = NAN;

if you wanted to be sure that there was no exception raised. So I have little experience with using it in code. Getting a tiny bit of feedback on the use of isNaN(), isnan(), isNAN(), isNan() from others and, the last two were less readable that the first two, and the first was more readable than the second, even though people are happy enough with the second because they are used to it.

I'd be happy enough for a write to show "nan" or "NaN". No time to get other users' opinion on this. Note that you need to be able to read both "nan", "NaN" or "NAN" in an input stream.

If you use INF in place of INFINITY in the text within a document, or in formula within a program, it is far more readable than the document/program with the current name which has too many capitals, even though that was how the 1985 standard referred to that constant. The name INF looks no bigger than a number such as 8.8. The variation Inf is less readable. The name inf does not jump out at you as much. That said, I currently often assign the constant INFINITY to a local param inf of the correct type in my own code so that choice would mean lots of editing on my part. But on that basis, I am obviously happen enough to use inf as a name for the constant.

So I can live with NaN and inf although I would prefer NaN and INF.

The last thing I want to see is isNan() but then again, I avoid letting camel-case mess with acronyms. If you follow that rule, the acronyms one's eyes expect to see in a program have disappeared, pretty well destroying readability in that context.

lydia-duncan commented 1 year ago

Information in preparation for an ad-hoc subteam discussion

1. What is the API being presented?

/* Returns a value for which :proc:`isinf` will return `true`. */
inline proc INFINITY param : real(64)
/* Returns a value for which :proc:`isnan` will return `true`. */
inline proc NAN param : real(64)

Under the covers, these functions return symbols created by the compiler, enabling them to be represented as reals and compile-time constants.

Brad also pointed out in the thread that they print as inf and nan, despite that not being how they are named.

Note: The standard module style guide says:

Parentheses-less functions that aren’t methods should be avoided

But I think it's okay for these symbols

How is it intended to be used?

Represents the concept of Infinity and when the result of a computation would be Not a Number. (I think this should be pretty straight forward)

How is it being used in Arkouda and CHAMPS?

In Arkouda:

In CHAMPS:

2. What's the history of the feature, if it already exists?


They were added in 2011, predating the extern keyword. At the time, they were extern constants that matched the C macro of the same name.

In 2012, they were converted from extern constants into argument-less functions, supported by the runtime. This support was contributed by University of Maryland students, as part of adding LLVM support.

Documentation comments for them were added in 2015. They were also sorted alphabetically then, with the rest of the module.

In 2019, the backing support for them was moved from the runtime to the compiler, allowing them (and other reals) to be param and computed at compile time.

Were moved to AutoMath.chpl when the Math module was split in two so that they could continue to be included by default in all programs.

3. What's the precedent in other languages, if they support it?

a. Python

Python uses the constants inf and nan (https://docs.python.org/3/library/math.html#math.inf, https://docs.python.org/3/library/math.html#math.nan).

b. C/C++

C defines them as macros (https://en.cppreference.com/w/c/numeric/math/INFINITY, https://en.cppreference.com/w/c/numeric/math/NAN). These macros match our current capitalization scheme and seem to be where we got it from.

Additionally, C++ defines three nan function overloads (nan, nanf and nanl), which take a const char* argument and convert it into an equivalent “quiet NaN value”, when possible (https://en.cppreference.com/w/cpp/numeric/math/nan). These are all completely lowercase.

c. Rust

Rust defines constants NAN (https://doc.rust-lang.org/std/primitive.f64.html#associatedconstant.NAN), INFINITY (https://doc.rust-lang.org/std/primitive.f64.html#associatedconstant.INFINITY) and NEG_INFINITY (https://doc.rust-lang.org/std/primitive.f64.html#associatedconstant.NEG_INFINITY). The f32 module also has its own version of each that matches that size of float.

d. Swift

Swift defines nan (quiet NaN, https://developer.apple.com/documentation/swift/double/nan?changes=la), signalingNaN (what it says on the tin, https://developer.apple.com/documentation/swift/double/signalingnan?changes=la), and infinity (https://developer.apple.com/documentation/swift/double/infinity?changes=la).

There’s an entry for negativeInfinity (https://developer.apple.com/documentation/swift/floatingpointclassification/negativeinfinity?changes=la) but it looks to be in beta.

e. Julia

Julia provides NaN (https://docs.julialang.org/en/v1/base/numbers/#Base.NaN), with explicit versions for 64, 32, and 16 bits. Julia also provides Inf (https://docs.julialang.org/en/v1/base/numbers/#Base.Inf), with explicit versions for 64, 32 and 16 bits.

Though it calls out that Inf is positive infinity, I don’t think they intend to provide a specific negative infinity.

f. Go

Go provides NaN and Inf as functions (https://pkg.go.dev/math#NaN, https://pkg.go.dev/math#Inf). The Inf function takes a sign argument, indicating whether positive or negative infinity should be used (this argument is an int).


4. Are there known Github issues with the feature?


5. Are there features it is related to? What impact would changing or adding this feature have on those other features?


They are strongly related to isfinite, isinf and isnan, functions provided by the AutoMath module to check against them. The current naming scheme doesn’t currently match the capitalization or truncation of those function names, so changes to INFINITY and NAN don’t necessarily require changes to these other features. However, we’re intending to consider them at the same time.

6. How do you propose to solve the problem?


For NAN:

A. Leave it as is

B. Rename to nan

C. Rename to NaN

For INFINITY:

A. Leave it as is

B. Rename to infinity

C. Rename to inf

D. Rename to Inf

E. Rename to Infinity

In terms of how they print, we can:

A. Leave it as is

B. Ensure they match the naming scheme in the module.

The amount of work this will be depends on if we change their names.

I would:

But I’m open to the other options (aside from leaving INFINITY as is). It would be nice to match what we do with isinf and isnan in some way.

damianmoz commented 1 year ago

Very impressed. Very thorough. How much time is allowed for user feedback from Github before internal decisions.

The following agrees with your choice of NaN. The solution below for the infinite case is more consistent. It disagrees with the use of infinity but definitely agrees that INFINITY should go.

What follows below covers more ground including what the IEEE 754 suggests and C does not cover.

I give you some simpler words. They are less complete.They are my bad but condensed summary from discussions from quite a while ago between a few of us. Mathematicians, and Engineers and Geophysicists who use Mathematics. Quiet a small sample population so not the best statistically. Note this was after the 2019 IEEEE 754 standard was released.

Premise: The names in a mathematical library API have to

Let's use the IEEE 754 standard as the main reference to keep it simple.

We probably have to think about the C standard too. But maybe not too much.

How do people write a Not-a-Number? Answer NaN.

How does the IEEE 754 standard write it? Answer NaN.

What does C/C++ and Chapel write a NaN? Answer NAN.

Comment: That is due to very dated C proprocessor constant name rules. Forgot it!

What do we do? Drop NAN and use what people use day-to-day, i.e. NaN!

Are users happy? Yes. Readable? Definitely. Consistent? Yes. Porting - Easy!

Done.

We currently test for IEEE 754 classification in C/C++ using isnan(), isinf() and occasionally, isfinite(). The new standard says we should also provide options to test for whether a finite number is normal, subnormal or zero, and whether a Not-a-Number is quiet or signaling. Those are all adjectives by the way.

Is isnan() consistent with NaN? No.

How do we fix it? Leverage our choice of NaN - i.e. change the classification routine it to isNaN().

Are users happy? Yes. Readable? Definitely. Consistent? Yes. Porting - Easy!

Done.

Note that the constant NaN is a quiet NaN. Will that come back to bite us? Maybe. An unqualified NaN (as written) infers both Quiet and Signaling NaNs. And the isNaN() macro (as did isnan() before it), returns true for both Quiet and Signaling _NaN_s. Hmmm.

Should we introduce a qNaN?

Would users be happy? Not really - Too Radical! Readable? Yes. Consistent? Certainly not common! Give up!

Moving on ...

How do we make the other classifications look like this isNaN()?

Answer:

Are the names too long? Yes but not so as to affect readability. Besides, the names are direct from the standard and are a single English word. Those names will not come up too often in the code so any excessively long name is not impactful. And those names will also mean that comments will not be needed in our code.

Are users happy? Yes. Readable? Definitely. Consistent? Yes.

Porting - not super relevant due to the limited use of these operations.

Done ... Those names are done and dusted.

What about this conflicting pairing of INFINITY and isinf(). They have been around for 30+ years. But they use inf and INFINITY, different identifiers to refer to the same thing. So it makes tracking the use of the same concept or idea in a program quite messy.

Some of us have used INF and inf for years in manuscripts and as abbreviations in our programs because INFINITY is too overpowering in code that it drowns out other things in an expression

What about infinity? No

What about INF or inf? Is the name too short? No (otherwise NaN would be also). Readable? Yes.

Consistent? Either pairing of isINF()/INF or isinf()/inf is. But which one?

Do we have consensus? No.

,,, after leaving the problem for a while.

OK - what about isInf()/Inf?

Hmmm. Its different. Hmmm. Not by much. Hmmm. I can live with it. Hmmm. Looks OK.

Are users happy? Largely. Readable? Definitely. Consistent? Definitely.

Porting - this is highly relevant due to lots of use. Is it? Looks acceptable.

Done!

That was not done in a day. Especially that last one.

So, looks ike the constants in this discussion are both 3 letters

**Inf**
**NaN**

and the classification routines are

**isInf**()
**isNaN**()
**isFinite**()
**isNormal**()
**isSubnormal**()
**isZero**()
**isQuiet**()
**isSignaling**()

Users happy? Yes. Readable? Yes. Consistent? Yes/ Porting easy? Most likely.

Done.

... Later (oops - we forgot)

How do we print them? Do we use inf and nan like before or Inf and NaN.

Maybe the latter. But keeping the former makes an old program's results with that from a new program a lot easier. Not 100% consensus. Keeping the former means no change for the compiler team. Can we set a compile flag to ask for one or the other if there is a real penchant to match the output with the symbol?

... Much much Later - The C standard has the SNAN constant?

What do we do here? For consistency, because it is written in a manuscript as sNaN, it really should have the name sNaN here. Unresolved.

Too hard. Let's come back later

lydia-duncan commented 1 year ago

Thanks, Damian! The internal discussion involving this topic will start on the 25th, so having your thoughts now is quite appreciated. We are currently giving ourselves around 2 weeks of discussion in a smaller subgroup, and once that group has agreement we post it to the larger team internally. When you see post discussion summaries go up, that's summaries from the smaller subgroup, so it's not generally too late to comment if something looks off. We're going to try to have discussion on this topic wrap up by August 8th, but if it's too contentious it might take longer.

I think there's definitely room for adding more of this functionality down the road. We won't have time before the anticipated Sept release (which we're aiming to be our first 2.0 candidate release) to add them, but that doesn't mean they will be forgotten - if you want to open an issue on them, that'd be great, but if not, I'll probably do so after discussion on these is finalized.

bradcray commented 1 year ago

Damian, thanks for the extensive feedback and thoughts. With a quick read of your comment, most of it seemed appealing to me except for one bit: I'd go with inf and isInf even though that represents a capitalization mismatch between the constant and its checking routine. My reasoning is that for an all lower-case constant like pi or e or my own constant myConst, I'd still write the routine as isPi() or isE() or isMyConst() (see also isInt(), isType(), isParam()`, etc.) where the change in capitalization is due to Chapel's standard library naming conventions for multi-word methods rather than due to a desire to make them match. And we also generally prefer lowercase is generally better for scalar constants.

It could be argued that this should make NaN get spelled nan, but I'm not arguing that, being very comfortable with making an exception for NaN given its dominance as an acronym and this being its preferred spelling (as I understand it).

[That said, if push came to shove and someone mandated more consistency, I'd rather re-spell NaN as nan than to re-spell pi, e, etc. as Pi, E, etc.]

damianmoz commented 1 year ago

Brad - It was not so much feedback as a summary of our somewhat-relevant-to-the-issue internal discussions from a while ago over an extended period including some casual phone conversations from the car.

To us inf and Inf are different. For decades, my brain has been hard-wired to think that lower casepi' is a greek letter and represents Archimedes constant, PI is an upper case greek letter used to represent an N'ary Product, and Pi looks like a typo. SImilarly, lower case e is Euler's number but upper case E is the modulus (= measure) of Elasticity of a material (or Elastic modulus or Young's modulus although the concept was actually developed conceptually by - guess who - that smart bunny Leonard Euler - nearly 300 years ago). Lots of mathematical constants in our formulae are upper case. They have been that way for many (often hundreds) of years, coming mainly from the physical world. Upper case K is a material's bulk modulus but lower case k is normally an integer. Hopefully, our code looks like our formulae. It should. It makes it infinitely easier for someone familiar with the topic but unfamiliar with the code, to debug and understand (and even change).

Using Inf for infinite everywhere is all about consistency - priority #3. Everywhere I see Inf, I think the same thing. If two words are spelt differently, they are different (but maybe related). Besides, in our discussions, Inf it was the only thing on which I could get consensus between everybody under our naming criteria. Chapel's naming guidelines are prioirity number 9 in our list of rules. When did those guidelines become a standard?

Luckily, I never need to use pi in some sort of is comparison and if I want to use pi I can just use it as is like in

proc sincpi(x : real(?w)) // normalized cardinal sine
{
   const pix = (pi * x):real(w);

   return sin(pix) / pix;
}

Lydia - I was not wanting to add functionality. I (largely) cut-and-pasted our notes. I explicitly included the stuff on those extra categories to show how the isX....() naming convention of isInf(), isNaN() and isFinite() seamlessly matches what is needed when you do need to expand those classification categories to handle Normal, SubNormal, and Zero numbers, along with Quiet and Signaling Not-a-Numbers. The recommendation to provide those extra categories has been in the IEEE 754 standard for least 15 years although only isnormal() is in the C standard library. Mind you, isZero() is trivial and you can write your own isSubnormal() in terms of isZero(), isNormal() and isFinite() without raising an exception so maybe those extra categories are easy to include.

damianmoz commented 1 year ago

Apologies for that awful formatting a minute or so ago. I terminated a display with two back quotes, not three. Silly me.

damianmoz commented 1 year ago

To answer my own question

How do we print them? Do we use inf and nan like before or Inf and NaN.

I looked further in my notes directory and there is the answer...

You go into 'frontend/lib/immediates/num.cpp and maybe runtime/src/chplcast.c, stick the strings "nan", "-nan", "inf" and "-inf" into an accessible structure of some kind and then provide a suitably atomic operation to change them. I did a quick hack late last year and it worked. My first change was to allow a user selectable minor change of a 3 letter replacement so that "inf" -> "Inf", and , "nan" -> "NaN", and the appropriate signed constants followed suit. I played with some other stuff. It also work for "INF" and "NAN" as well as "bob" and "tom" although the letter is not very useful.

My testing was less than adequate and my interface crude. Maybe the answer is more complicated but I doubt it.

I did notice that there is no signed "nan" in chplcast.c unlike in num.cpp. Is that intentional?

I had in those notes at the time to create an issue about allowing writef to have something like "%+i" but for real(w) numbers to force the sign to print even for non-negative numbers. This allows you to see +0.0 and +inf. I cannot find any mention of this as a new issue. Am I searching for the wrong string? Or did I just forget? Probably the latter.

damianmoz commented 1 year ago

Found it. There was no need to submit an issue to address the issue. There was an undocumented feature. "%+6dr" works as expected. Maybe a need to submit an issue to add it to the documentation.

Sorry for the noise.

lydia-duncan commented 1 year ago

It's definitely valuable to know that users feel that strongly about the capitalization of these symbols.

You mentioned different meanings for various constants based on capitalization, but didn't mention what inf would mean to you as opposed to the capitalized form. Does it bother you to use that form in Python, where everything is lowercased?

lydia-duncan commented 1 year ago

In our discussion today, we decided to rename to nan and inf. We are happy that this also means the way they are printed will match how they are named without additional adjustment (but would have been willing to adjust the way they are printed).

This was decided by looking at the combinations of these names and the names for their related functions. I will post the decisions for those functions on their issues.

We are willing to meet again if new counterpoints are presented.

Points from the discussion: