ICRAR / leap-accelerate

Low-frequency Excision of the Atmosphere in Parallel
GNU General Public License v2.0
1 stars 1 forks source link

Cuda Calibrate Tests Passing #42

Closed calgray closed 4 years ago

calgray commented 4 years ago

This is a changeset to get CUDA leap code running identical to the single threaded implementation to about 6 decimal places.

One of the optimizations also made here is to load the visiibilities and uvws once into a single integration object in order to perform one cuda call per direction. this may not be possible to do with the too many baselines and channels as the gpu would run out of memory, however there is a task to report the compute and memory footprint for beta release where the batching of baselines can be restored afterwards.

This also adds a bugfix where only the first direction in the casa implementation was being calculated correctly.

rtobar commented 4 years ago

The still-broken build seems to be caused by the problem described here (ultimately an issue in gcc 5.5). But on that job you are already using g++-6, you could try using gcc-6 as well as the CMAKE_C_COMPILER and as the host compiler used by nvcc.