chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 420 forks source link

[Feature Request]: Support `complex` numbers in GPU kernels #26054

Open jabraham17 opened 6 days ago

jabraham17 commented 6 days ago

We currently have limited-to-no support for using complex numbers in a GPU kernel.

I tried a number of simple cases with complex numbers to get an idea of where our support is at. All tests are done with CHPL_GPU=nvidia, except where otherwise noted.

What Works

Using existing complex numbers

use Math;

on here.gpus[0] {
  var Arr: [1..10] complex;
  @assertOnGpu
  foreach i in 1..10 {
    Arr[i] = 1+10i;
  }
  writeln(Arr);
}

This program actually does work! The immediate can also be factored outside the kernel and it will still work.

What doesn't work

Creating complex numbers from ints/reals

use Math;

on here.gpus[0] {
  var Arr: [1..10] complex;
  @assertOnGpu
  foreach i in 1..10 {
    Arr[i] = i:complex;
  }
  writeln(Arr);
}

This code fails because the cast to complex involves _chpl_complex128, which is not as a GPU eligible extern function. So assertOnGpu fails with "ChapelTuple.chpl:295: note: function calls out to extern function (_chpl_complex128), which is not marked as GPU eligible".

Complex math

use Math;

on here.gpus[0] {
  var Arr: [1..10] complex;
  const c = -1-2i;
  @assertOnGpu
  foreach i in 1..10 {
    Arr[i] = 1+10i + c;
  }
  writeln(Arr);
}

This program results in an internal error in the Chapel compiler during GPU compilation in the backend.

Calling math functions

use Math;

on here.gpus[0] {
  var Arr: [1..10] complex;
  const c = 2+2i;
  @assertOnGpu
  foreach i in 1..10 {
    Arr[i] = sin(c);
  }
  writeln(Arr);
}

This program fails at codegen time with CHPL_GPU=nvidia when ptxas is being invoked with "ptxas fatal : Unresolved extern function 'csin'".

I tried this same code with CHPL_GPU=amd, here it fails as "lld: error: undefined hidden symbol: csin".

Similar errors occur with other Math functions

damianmoz commented 5 days ago

Rather than

Arr[i] = i:complex

What happens if you have

inline proc cmplx(i : int(?w))
{
    var t :complex(w+w);

    t.re = i:real(w);
}
.....
Arr[i] = cmplx(i);
jabraham17 commented 5 days ago

What happens if you have

inline proc cmplx(i : int(?w))
{
   var t :complex(w+w);

   t.re = i:real(w);
}
.....
Arr[i] = cmplx(i);

That has similar issues to the complex math case in the original post, it hits an internal error during GPU codegen. For the + case it was the compiler could not find complexAdd128, for t.re = ... its because the compiler cannot find complex128GetRealRef.

damianmoz commented 5 days ago

Got it. So much for my suggestion.