chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 418 forks source link

Complex square root of an imaginary type #24043

Closed damianmoz closed 9 months ago

damianmoz commented 9 months ago

Sorry. Hit the wrong key.

damianmoz commented 9 months ago

Using this simple function as an example for a more general issue and presuming the declaration

proc cmplx(x : real(?w), y : real(w))
{
    var t : complex(w + w);

    t.re = x; t.im = y;
    return t;
}

what is the difference between the non-generic declaration:

inline proc sqrt(x : complex(64))
{
    const re = x.re;
    const im = x.im;
    const half = 0.5:real(32);
    const t = sqrt(re * re + im * im);
    const u = sqrt((t + re) * half);
    const v = sqrt((t - re) * half);

    return cmplx(u, if x.im < 0 then -v else v);
}
inline proc sqrt(x : complex(128))
{
    const re = x.re;
    const im = x.im;
    const half = 0.5:real(64);
    const t = sqrt(re * re + im * im);
    const u = sqrt((t + re) * half);
    const v = sqrt((t - re) * half);

    return cmplx(u, if x.im < 0 then -v else v);
}

and the generic version

inline proc sqrt(x : complex(?w))
{
    const re = x.re;
    const im = x.im;
    const half = 0.5:real(w / 2);
    const t = sqrt(re * re + im * im);
    const u = sqrt((t + re) * half);
    const v = sqrt((t - re) * half);

    return cmplx(u, if x.im < 0 then -v else v);
}

Is one better than (or preferred over) the other. Certainly, the generic one has problems if fed an imaginary datum

csqrt.chpl:84: In function 'main':
csqrt.chpl:89: warning: deprecated use of implicit conversion when passing to a generic formal
csqrt.chpl:89: note: actual with type 'imag(32)'
csqrt.chpl:22: note: is passed to formal with type 'complex(?w)'
note: consider adding a cast to 'complex(64)' or an overload to handle 'imag(32)

This is not a problem as it can be easily handled by Chapel by providing a generic version of that routine with an imag(w) parameter.

proc sqrt(x : imag(?w))
{
    const r = x:real(w);
    const t = sqrt(abs(r) / 2);

    return cmplx(t, if r < 0 then -t else t);
|

Ignoring the fact that in this example, the explicit imaginary example is simpler mathematically, which approach is recommended, the fully generic one or the explicit use of all different variants.

bradcray commented 9 months ago

Hi Damian —

I think you've identified the main difference, which is that the generic procedure is not an option for implicit conversions and only accepts complex values. Whereas the explicit, non-generic stamping out of the overloads permits Chapel's implicit conversions between numeric types. So one reason a procedure author might choose between them is if they want one behavior or the other. E.g., if I wanted to write a numBitsInMantissa() routine accepting reals, I may not want the coercion to take place and permit calls on ints, so might prefer the generic version. Whereas if I had a sqrt() routine like this, I might not care if the coercion took place, so might write the explicit overloads. [Of course, as you point out in your example, better would probably be to have explicit overloads for reals and imaginaries as well to save some computation in which case this point matters less because those would be better matches than the ones requiring explicit overloads].

Another more minor differences are that the generic version will only be instantiated for the signatures that are required and utilized, whereas the concrete versions will exist whether or not they're used. That said, dead code elimination should eliminate this difference by the time code is generated, so this should be no big deal.

And lastly, there's the question of whether one or the other is easier to maintain from a software engineering perspective. Is it clearer to have two separate function bodies, or are they so similar that it'd be better for them to share a single implementation?

Overall, I'd say that the differences here are minor, and intended to be minor, where the implicit conversion difference is the biggest one.

I want to note that there is also a difference today in how the production compiler handles generic arguments with size queries like your ?w here that we're moving to work away from because we believe it's neither intuitive nor principled. For a single-argument routine like this, that doesn't really matter. For a multi-argument routine that uses w in subsequent arguments, there is a planned change in behavior going forward, and we have implemented warnings to help alert users to such cases in the meantime. I believe you've seen such examples in previous discussions, so I won't recap it here and now, but can do that if it'd be helpful.

damianmoz commented 9 months ago

Thanks for that excellent explanation. I will use that going forward in what we are doing (which is a slight variation/specialization/simplification on what the title says). I kept to the elementary function as noted in the title in this discussion to ensure generality and avoid any need for extra explanation on my part of the underlying problem.

With most modern hardware having an instruction for square root or at least an approximation for its reciprocal, I would like to see Chapel's square root of a real(w) guaranteed to be implemented as (effectively) in line assembler, as has just been done superbly for an fma. I would also like to see the availability of rsqrt (or some approximation) on a similar basis. And because highly optimal versions of the square root of a complex(w) number exist which are truly generic for which Chapel is ideally suited (and better suited than C/C++), that means you could have a generic Chapel version of that routine. Yes, this requires a generic version for an imag(w). But as can be seen from the above, that scenario is simpler and faster than worrying about doing an implicit conversion of the imag(w) number to a complex(w) number and then doing a complex square root on that conversion.

When, ... well before the next IEEE 2029 release so purely in the low priority basket.

Certainly the square root of a complex (or purely imaginary) number is a well understood and excellent example of generic programming that can be written clearly and simply in Chapel. Note that a production version of the code would be more sophisticated that what I have shown above for a complex(w) number. Interestingly, the version for an Imag(w) shown above is as sophisticated as it ever needs to be which shows both the power of Chapel and the wise choice in the first place of have a floating point type which is purely imaginary.

damianmoz commented 9 months ago

I might close it down at this point. I wonder whether this issue should really have gone into Discourse. That said, the next iteration of the more general topic to which I alluded in the prior comment definitely needs to stay in Github (or whatever is being used at that time for this purpose).

Thanks again.

bradcray commented 9 months ago

Thanks Damian. I think you're right that this might've been better on Discourse, being more of a Q&A than an action item, but no worries (and I appreciate you closing it).

I would like to encourage you to open a new issue (or maybe issues?) for the requests you made in your second paragraph above about optimized versions of the routines that take advantage of modern assembly (when available). I would have hoped that the C library implementations we rely upon would effectively do this without extra overhead if we (and C) are inlining everything sufficiently well. But I'm imagining you have experience showing that that's not the case? (either in Chapel or in C itself?).

damianmoz commented 9 months ago

In this case, I started it in Github because I thought I might be suggesting a change to what was in AutoMath for the square root of an imag(w). While you could put my suggestion in now as it stands, I would leave it for now. Also, the last time I asked a question in Discourse, it instantly because a Github issue.

I would have hoped that the C library implementations we rely upon would effectively
do this without extra overhead if we (and C) are inlining everything sufficiently well. But
I'm imagining you have experience showing that that's not the case? (either in Chapel
or in C itself?).

Your imaginings are correct. In C/C++ - Yes. And I am not being able to always understand why the compiler, clang more often than gcc, cannot do so. Those compilers have improved doing this over time. But it is still unpredictable enough that in C/C++, I sometimes have to resort to using my own assembler along the lines of

static inline double
fastsqr(const double s)
{
    double r;

    __asm__ ("sqrtsd %1, %0" : "=x" (r) : "xm" (s));
    return r;
}

which is suboptimal for a SIMD Intel or needs a different variant for an ARM.