Closed twesterhout closed 3 months ago
Hi Tom — Thanks for filing this issue. I know we did some work here to try and align more with C/C++ a few years back, but I can't recall whether the behavior you're seeing indicates a case we missed or an intention that I'm forgetting. Let me tag @mppf on this, who led that effort in hopes that he remembers better than I do.
I fully support the underlying rule that a real(w) expression will not be influenced by any integers therein, i.e. an expression with real(32) and integers of any precision stays as real(32). Otherwise,. lots of things break, badly, many of which can also take a long time to track down when a language does not do that.
However, Tom's case in point is what happens when the expression is purely integral.
My paraphrasing of the default rule (which I fully support) is that an integer which is (or can be) represented in an int(w) maps to real(w) as Chapel does currently. However, I fully support Tom's suggestion that it would be extremely useful if the compiler issue a warning of possible loss of precision (unless that warning is suppressed). I think I have suggested this in the past but put a non-urgent
tag on it. I did have a look at what be required with a view to changing it myself until I realized that it was outside my skill set.
One could also have an option that by default promotes any integral expression to real(64). While it would be an option I would never use, it would help in your case Tom. That said, I would consider such use extremely dangerous.
An even more useful option would be one that raises an error when any purely integral expression is passed to a function whose argument is real(w). I would find that option extremely useful and it would catch lots of errors.
Our programming practice is that we mandate that any integral expression be explicitly cast to the type it needs to be when being assigned to a variable of type real(w) or being passed to an argument of type real(w). So the error which Tom mentions never arises in our day-to-day work unless we have been less than careful. But currently, we have to identify such errors manually by code-reviews.
My paraphrasing of the default rule (which I fully support) is that an integer which is (or can be) represented in an int(w) maps to real(w) as Chapel does currently.
Oh yeah, thanks for the comment Damian. I think you're correct that this was a guiding principle in the recent work I was referring to—that if I'm computing with 32-bit values (because I'm on a 32-bit platform, say), we should preserve that width rather than inflating things to a wider precision for the user and requiring them to downcast to stay at the specific size. Of course, we don't have a real(8)
or real(16)
currently, so if int(32)
goes to real(32)
, it makes sense to me that int(16)
also goes to real(32)
rather than real(64)
. @twesterhout, do you buy that argument?
I'd be on-board with a warning to warn users of cases like this, and also remain on-board with Damian's broader desire for a way to opt out of implicit conversions—though I usually propose doing that with a pseudo-module like use NoImplicitConversions;
rather than a compiler flag so that the assumption is visible in the text of the source code rather than relying on a reader to know what the command-line is expected to be (and the module also permits the behavior to be scoped rather than being broadly applied to an entire program or file).
When Chapel starts to get more heavily used for AI/ML, and it seems ideally placed/suited for it, you might want to preserve the int(16)/real(16) conversions in many situations where the underlying hardware supports it. But there are other situations where you want to upscale. I am still trying to wrap my brain around IEEE P3109's 8-bit mini-float document so I cannot begin to comment intelligently on the real(8) stuff.
Yes, I would like to opt-out of implicitly conversions during development and debugging. But it is not essential. I am happy to have implicit conversions in my code as they often greatly enhance readability, especially where source code is the ultimate documentation for an algorithm. In that context, I would like to have the compiler tell me where it sees those conversions because they can be a source of pain and frustration (when I sadly get them wrong).
I like the idea of a pseudo module but I would like a way to disable it on the command line without having to resort to commenting it out in the source code which might default to read-only in a master copy, especially during testing.
Please notice the edit in the first paragraph.
Yeah, I think the comments so far match my understanding. I agree that a warning in this case would be welcome.
Oddly enough, thinking of adding real(16)
: I think that introduces a language stability problem for cases like this, because say sin(myInt16)
would result in a real(16)
rather than the real(32)
we get today. As a result, we might want to make these cases also generate a --warn-unstable
warning. (Or we could see if it occurs rarely enough that an error is suitable).
Hello Brad, Damian, and Michael. Sorry for the delay -- busy day. The arguments that Damian is making totally make sense to me. I do think that there are two slightly separate issues that we're discussing. The first issue is whether and how integral types should be promoted to real or complex types if they appear in an expression containing real/complex arguments. The second is whether and how integral types should be promoted to real or complex types if they are passed to mathematical functions whose natural domain is ℝ or ℂ.
I don't have much to comment on the first issue -- I think it's been covered by Damian and Michael -- except to say that there, we're not talking about a loss of precision (e.g., uint(16)
fits nicely into a real(32)
, so it's safe; uint(16)
to real(16)
and uint(32)
to real(32)
would be a different story though...).
In the second issue though (i.e., sqrt(x:uint(16))
), uint(16)
is the only type that appears in the expression, so I think the choice to convert it to real(32)
instead of real(64)
is less clear. For instance, C & Julia convert to real(64)
, but NumPy converts to real(32)
. There's also the standard "use real(64) unless you know what you're doing" advice (because lower precision floats are much less forgiving and require more thought on the programmer's side), so it would seem that converting to real(64)
is more beginner friendly.
Of course, a drastically different approach is to forbid any implicit integral to real conversions like Haskell does (which actually matches the hardware -- at least that's what I know about x86-64), but even though I personally quite like it, it is probably not the best approach for Chapel :smiley:
In any case, a compiler warning would be very highly appreciated. And potentially a warning in the documentation? (E.g., in the form of a "notable differences from X" page or at the beginning of the Math module)
I could comment here on your paragraph starting with In the second issue though
. However, the full answer is a bit out of scope and is related to decisions made before the introduction of the IEEE floating point standard. I can reply by email on this if you want the gory/sad details.
You did mention that
There's also the standard "use real(64) unless you know what you're doing" advice (because lower precision floats are much less forgiving and require more thought on the programmer's side), so it would seem that converting to real(64) is more beginner friendly.
Chapel has you covered there if you stick to using an int
type, and a real
type, because it uses 64 bits for such types. So it is already beginner friendly. And because 64-bit integral types convert to 64-bit floating point types, Chapel is also staying consistent with rules for lesser precision integral types.
From the sounds of it, the compiler warning you are requesting looks like it is coming sooner rather than later. I can/will draft up some words (in the next few weeks) that might be suitable for inclusion at the start of the Math module. After Easter. They might/will need some iteration to get them sounding like they fit (and are not too waffly).
You note that a drastically different approach is to forbid any implicit integral to real conversions like Haskell does
. As Brad noted, I too have suggested such a feature although unlike Haskell, I think it should be optional. I believe there is a consensus there on having this in the future. So this should be able to catch such problems. What its priority is, I will leave to others far more au-fait with the human resources available.
Just for the record, the approach of converting things to double precision by default can also cause just as many, if not more and worse errors in C/C++ (and Chapel up until reasonably recently, well, at least pre-COVID). Of more concern is that this approach often hides the real problem and just postpones (and makes it harder) finding the offending code. I am dealing with just such a problem right now in a Smoothed Particle Hydrodynamics application - not fun.
Skipping back to @twesterhout's last comment:
For instance, C & Julia convert to real(64)
Tom, how are you evaluating C's conversions here? For example, a case like the one in the OP doesn't really have an equivalent in C since it doesn't support overloading and has distinct sqrt()
vs. sqrtf()
routines for double vs. float. So if the test was "What does sqrt((int16_t)2)
do in C?", it's not surprising it converts to double
since that's the only routine available and it's a legal conversion. (If it's not obvious, note that Chapel supports two overloads of sqrt()
, one that takes real and one that takes double).
Setting up a similar example with overloading, in C++, I wasn't sure what I'd get, but it looks like the answer is an ambiguity error—surprisingly to me, even for integers of larger fixed widths like int32_t
or int64_t
.
We could consider doing the same thing in Chapel, though that feels like a big change to make now, and I feel somewhat reassured if our behavior does match NumPy.
I'll also mention that, at least once recently, I've asked whether we should consider having an IntMath
module that would define things like sqrt()
on ints to return ints rather than promoting to floating point. On one hand, I think this would be fairly different than what other languages do; on the other hand, it's pretty similar to how operators like /
work—preserving type, even if it results in a less precise answer.
Should we fork off the "Have Chapel warn about implicit conversions from int(w)
to real(w2)
where w
< w2
" into its own issue, or just have this issue's action item be to do just that?
Chapel's behaviour matches Fortran (as well as numpy
) and also, as you say.
sqrtf((int16_t) 2)
Chapel's behaviour for such scenarios is more rigorous than C. It is very well defined.
I currently define (and I am just racing out the door so I might have a typo)
inline proc sqrt(param x : int(?w), type T = real(64)) param
{
....
}
which allows me to write things like
sqrt(2)
and know I get a far more readable equivalent to Chapel's sqrt_2
for the same zero run-time overhead and without cluttering up the name space with all these extra symbols. So, what you are proposing with IntMath might break what I do.
Tom, I will send you a fix to make the "surprising" out of the behaviour. But later today (your time). It will give you (almost) the behaviour of Haskell which you said you are happy with.
Tom. Stick
inline proc sqrt(param x : int(?w), type T = real(64)) param
{
param A002193 = 1.4142135623730950488016887242096980785696718753;
if x == 0 || x == 1 then
return x:T;
else if x != 2 then
compilerError("Only sqrt(0) or sqrt(1) or sqrt(2) is supported");
return A002193:T;
}
inline proc sqrt(x : int(?w), type T = real(64))
{
return sqrt(x:T);
}
somewhere your program can find it, e.g. in some master include file of your own, or maybe AutoMath,chpl.
Once you are compiling with the above being pulled in appropriately, the only integral expressions fed to sqrt() that the Chapel compiler will now accept are either a) a compile-time integral expression which evaluates to either 0 or 1 or 2, anything else is rejected b a run-time integral expression which by default is cast to a real(64) expression unless you tell it otherwise.
If you want a compile-time positive integral expression which exceeds 2, then cast it yourself to stop the compiler error:
sqrt(493:real(64));
If you are wondering where that funny variable name came from, see OEIS.org
@bradcray or @mppf will probably have wiser words on how best to incorporate these two procs into your own program.
@damianmoz thanks for the explanations and the workaround!
@bradcray C99's math.h defines some type-generic macros: https://en.cppreference.com/w/c/numeric/math/sqrt. Bullet point 4 there says:
If arg has type long double, sqrtl is called. Otherwise, if arg has integer type or the type double, sqrt is called. Otherwise, sqrtf is called. If arg is complex or imaginary, then the macro invokes the corresponding complex function (csqrtf, csqrt, csqrtl).
Written in Chapel-ese, those C rules preclude type promotion of an int(w) where w < 64 to anything other than than an real(64) when used with C's type-generic macros. Chapel's rules are simpler, are concurrently both more flexible and more rigorous, and are more orthogonal. And as you can see from what I sent you, those same rules can also be used to provide the C++ functionality (or that of tgmath.h of C) without affecting those of us who deliberately do not want such type promotion. The procs I sent you were more than a workaround. It is an enhancement built on top of Chapel using the powerful features of Chapel that still obeys all the Chapel type mixing rules.
I'm looking at adding a warning. There are two ways I could implement the warning:
real
/ complex
widths (e.g. with proc sqrt(x: real(32)
and proc sqrt(x: real(64)
), but otherwise the same rule as above (not warning for the same width or if it converts to the default width).At first, I was thinking that (2) would be better (by making the warning less likely to appear in cases where it doesn't matter) but it seems that, if we were to add proc sqrt
overloads for real(8)
and real(16)
(supposing that those are implemented; today they are not) then @twesterhout would still want to see a warning for something like sqrt(myInt16)
rather than ending up with a real(16)
result.
Any thoughts on which of these (or some other rule) would be better?
Something to ponder deeply over the weekend. Thanks for that,
I know I do not have enough experience in working with real(16).
For now I would go with the simplest one to implement and then Tom and others see how well it works. If it needs refinement, then so be it. I am sure it will.I think it will only get tested thoroughly when it is used on hardware which implements IEEE 754 binary16 and a version of Chapel that knows how to exploit it.
@twesterhout:
@bradcray C99's math.h defines some type-generic macros: https://en.cppreference.com/w/c/numeric/math/sqrt.
Thanks for pointing that out. This still doesn't quite feel like the same thing to me since it isn't choosing between a float vs. double overload of the routine. It also seems likely it was made to work this way due to the need for backward compatibility since sqrt(my32BitFloat)
would have called sqrt(double arg)
in traditional C prior to this macro's addition, and that behavior presumably had to be retained after the macro was added (whereas sqrt(myLongDouble)
would never have resolved to sqrt(double arg)
so could be added without breaking existing code).
@mppf: Either of the two warnings seem reasonable and helpful to me. I'll be curious for Tom's thoughts as well as the impact (of either) on existing code in our testing system.
@mppf: Either of the two warnings seem reasonable and helpful to me. I'll be curious for Tom's thoughts as well as the impact (of either) on existing code in our testing system.
I've run testing on a prototype implementation of (1) and see 63 failures.
One case comes up in the record primer & looks like this:
record Color {
var red: uint(8);
var green: uint(8);
var blue: uint(8);
}
proc Color.luminance() {
return 0.2126*red + 0.7152*green + 0.0722*blue; // warns here
}
Put perhaps there is a variation of (2) that would not warn here, if that is important to us. (perhaps we could limit the warning to cases where the overloads under consideration have differing floating point return types)
@mppf thanks for working on this!
I'm looking into your warning suggestions, and would warning whenever a conversion occurs from real(w1)
to real(w2)
with w2 > w1
not just cover all the cases (similar thing with complex(w1)->complex(w2)
and with real(w1)->complex(w2)
)? This would also avoid warning in the above example with Color
.
@twesterhout - might be missing something, but I thought the problematic case we are trying to warn for is when implicit converting a a small integer (e.g. int(16)
) into real(32)
(today) or real(16)
etc (in the future).
@mppf right, but after reading and thinking about Damian's comments, I'm starting to agree that converting int(16)
to real(32)
or real(16)
isn't bad by itself. What's causing precision loss is when that real(32)
number is used in combination with real(64)
numbers coming from other sources. E.g. sqrt(x:int(16)) * sin(y:real(64))
is most likely a programmer's error whereas sqrt(x:int(16)) + 2:real(32)
seems fine. Does that make sense?
Gotcha. One nice thing about that approach is that it focuses on the floating point part.
Some concerns about it:
var x = sin(myInt16) + cos(myOtherInt16)
(since there is no real(64)
in sight). Sure, x
might eventually be used in a way that combines with a real(64)
, but that might not happen if the code relies heavily on type inference.sin(myInt16)
is unstable (since it is currently real(32)
but in the future it will be real(16)
).sqrt(x:int(16)) * sin(y:real(64))
, the programmer's assumption might have been that sqrt(x:int(16))
would result is something of type real(64)
; and that's where the error is. Since the warning would complain about the *
call, it might not be as helpful as a warning focused on the implicit conversion from a small integer to a floating point type.Nevertheless, I tried implementing it real quick and running testing with it. I saw 180 failures. So, whether it is good or bad, the pattern of converting real(32)
into real(64)
occurs more in our tests than the pattern of converting small integers to real(32)
.
Very cool!
Lastly, in your code example sqrt(x:int(16)) sin(y:real(64)), the programmer's assumption might have been that sqrt(x:int(16)) would result is something of type real(64); and that's where the error is. Since the warning would complain about the call, it might not be as helpful as a warning focused on the implicit conversion from a small integer to a floating point type.
In my opinion this is the biggest drawback, but to be honest, having a warning outweighs the inconvenience of compile-time debugging for me. E.g., I much rather spend a few minutes recompiling my code a few times and let the compiler check my casts than spend those minutes carefully reading the code and making sure I didn't miss anything. But that's subjective.
@twesterhout : Thinking outside the box for a moment (motivated by feeling nervous about warnings that users don't care about and eventually just start ignoring), is there a world in which tools other than warnings would be a more satisfying way to catch these surprises/misassumptions about what's going on? E.g., a VSCode extension that would tell you the types of the expressions you're typing in such that you'd see real(32)
rather than real(64)
on sqrt(myInt32)
? Or a mode in which the code was spit back out in a strongly typed format?
Some of us use vi
, make
and the compiler an type shell commands all day.
I have never used Visual Studio in my life. Maybe that explains lots of things! :)
When I write C code, it has to pass through a lint tool before I am happy with it although I sometimes allow the lint tool to relax some options.
@bradcray That's an interesting approach. I don't use VSCode, so an extension wouldn't help much... but I think the features that you're describing, such as hover to see the types, are often part of LSPs. So if Chapel's LSP gained support for these features (or perhaps it already has and I just missed it) I'd definitely use them (e.g., I routinely use the "hover" feature in Haskell to see the types because of its extensive type inference, and from time to time in C++ as well when using the "auto everything" approach).
Your second idea isn't covered by the LSP protocol though (I think), and being able to inspect the IR (I didn't know Chapel had an IR beyond the AST, cool!) of a given function sounds like a great feature. Especially if casts, remote accesses, task spawns, whether a loop is vectorised, etc. become explicit there, I could imagine dumping the IR to a file and then grepping through it to analyze the code (a similar workflow is possible with the C backend now, but then it's C with all its quirks, and personally I found the C code generated by chpl
a bit hard to read though that's likely due to lack of practice with it).
(motivated by feeling nervous about warnings that users don't care about and eventually just start ignoring)
It's not immediately obvious to me whether an IR inspection utility would be more popular among the users than a warning (especially if it's enabled by default :smiling_imp:). To me personally, a warning would yield more value in the short term, but I'd totally understand if you chose a different route.
I think a warning is appropriate here.
(I think the LSP efforts are great and very helpful, and we're working towards including a prototype in the upcoming release. I'm looking forward to hearing your thoughts on it. However, IMO the additional information from the LSP isn't enough to reliably prevent the problem Tom originally ran into here)
Revisiting the example from the records primer, here is a slightly more self-contained reproducer:
proc foo() {
var red: uint(8) = 1;
var green: uint(8) = 2;
var blue: uint(8) = 3;
return 0.2126*red + 0.7152*green + 0.0722*blue;
}
proc main() {
var x = foo();
writeln("foo() = ", x, " : ", x.type:string);
}
In current releases of Chapel (1.27 and newer) this prints:
foo() = 1.8596 : real(32)
But, in older versions of Chapel (1.26 and older) it printed:
foo() = 1.8596 : real(64)
At the same time, programmers coming from other languages might well expect it to result in a real(64)
(because that is the type of the floating point constants). As https://github.com/chapel-lang/chapel/issues/24496#issuecomment-1971985249 noted, C and Julia have different behavior. Additionally, my prototype with option (1) showed a relatively small number of tests that trigger the warning. As a result, my expectation is that this warning won't be too annoying & it'll instead help people who run into this behavior in the future.
I've created PR #24529 to add a warning for code like sqrt(2:uint(16))
. While there, I added a bunch of other off-by-default warnings that can be used to opt in to further warnings about implicit numeric conversions.
@twesterhout : Michael has now merged his PR which adds a number of new opt-in numeric conversion warning modes. We'll definitely be curious how you find them, and particularly whether you feel this issue can be closed as a result of them.
@twesterhout : Not hearing back from you, and being under the impression that we've decided that the OP behavior is OK, I'm going to close this. Please let me know if there's something more that should be done, either to warrant re-opening this or forking an item off into an issue of its own.
Summary of Problem
It seems that invoking
sqrt
with integral arguments doesn't consistently convert them toreal(64)
. For instance,results in
Even though one can say that
real(32)
is sufficient to store auint(16)
value, and that's whysqrt
is computed forreal(32)
, I'd like to argue that this behavior can introduce very difficult to track bugs. In C, integral types are promoted todouble
before passing them to math functions (https://en.cppreference.com/w/c/numeric/math/sqrt): Godbolt test)%3B%0A++++fprintf(stderr,+%22%25.20lf%5Cn%22,+(double)sqrt(2.0))%3B%0A++++fprintf(stderr,+%22%25.20lf%5Cn%22,+(double)sqrtf(2.0f))%3B%0A++++return+0%3B%0A%7D'),l:'5',n:'0',o:'C+source+%231',t:'0')),k:48.87005649717514,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((h:executor,i:(argsPanelShown:'1',compilationPanelShown:'0',compiler:cg132,compilerName:'',compilerOutShown:'0',execArgs:'',execStdin:'',fontScale:14,fontUsePx:'0',j:1,lang:___c,libs:!(),options:'',overrides:!(),runtimeTools:!(),source:1,stdinPanelShown:'1',tree:0,wrap:'1'),l:'5',n:'0',o:'Executor+x86-64+gcc+13.2+(C,+Editor+%231)',t:'0')),k:51.12994350282486,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4). Moreover, C compilers have warnings such as-Wdouble-promotion
that allow detecting cases when, e.g., a float is multiplied by a double. Chapel doesn't seem to warn about such cases and happily computessqrt
forreal(32)
and then promotes the result toreal(64)
when multiplied byreal(64)
, and such errors may be very difficult to track in a bigger program.Would it be possible for Chapel to follow C & C++ here? Or could calls like
sqrt(x:uint(16))
trigger a error or at least a warning?Thanks!