Support for double-precision calculations

evanmiller commented 7 years ago

(Following up on the discussion in #4, cc @BobBane)

I avoided double-precision in the past because Apple's OpenCL compiler was very buggy with doubles (crashing etc). The compiler quality has improved considerably since 2011, and my other OpenCL project (not open source, and not geo-related) uses double-precision exclusively now.

I see a few paths forward here:

Migrate everything to double-precision
Maintain separate kernels for single-precision and double-precision
Use typedefs / #defines with a single set of kernels

I'm hesitant to adopt the #define KFLOAT float approach because the algorithms themselves may need to differ between single and double-precision. I.e. I use a lot of tricks to avoid round-off in the single-precision world that wouldn't be necessary with double-precision. Then there's the various tolerance levels and iteration counts that will need to differ between the two precisions, as well as the annoyance that literals must have an "f" specifier in the single-precision world (i.e. 0.5 has to be written 0.5f).

The only real reason to maintain single-precision is for applications that (strongly) prefer speed over accuracy, or to support GPUs that lack double-precision support. With Magic Maps, the slowdown might be an acceptable trade-off, but I won't really know until I try it out. So I'm hesitant to rip out the single-precision code willy-nilly.

For now I'm leaning toward a two-kernel world, implementing (porting) double-precision versions of projections as needed. I imagine an extra argument to pl_context_init would specify the desired computation precision — which would later be passed to pl_find_kernel — and I think that a wrapper function (or several) around clSetKernelArg should mean we won't have to duplicate many host-side functions. I'm envisioning two separate folders kernel/float/ and kernel/double/, with all OpenCL functions prefixed with either plf_ or pld_ depending on the precision. Generally only one set of kernels would be compiled for a given PLContext.

To keep a unified C API, I am fine with requiring all input/output buffers to be double-precision; for my application, copying matters much less than raw computation speed, so I'm okay with sending in and reading out only double and double *. So the user would only need to think about the precision choice when initializing the context, and thus could easily switch between computation precisions without having to rework all the client code.

What do you think?

BobBane commented 7 years ago

In kernels, nothing is as simple as it seems. Given the syntactic quirks (0.5f - ouch) and semantic differences, having separate float/double kernels seems mandatory.

All I/O in double precision would certainly meet my needs, as long as it doesn't overburden yours.

evanmiller commented 7 years ago

OK then — in that case I'll start laying some groundwork in the internal API, so you'll be productive when you get a chance to tackle double-precision properly. (Nothing too drastic... if your priorities change it won't hurt my feelings :-)

evanmiller commented 7 years ago

By the way, the 0.5/0.5f dilemma can be avoided with -cl-single-precision-constant (under "Math Intrinsics Options"):

https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clBuildProgram.html

That at least opens the door to unified kernels, though I still think separate ones will be preferable.

BobBane commented 7 years ago

We're still left with:

the rounding-sensitive computations (which probably won't hurt single->double, just look odd)
the number of iterations and tolerance values (compiler could handle that at compile time with case (sizeof (KFLOAT)) ... , but yuck)

Separating the kernels still seems right for now.

I am likely to start on the lat/lon extension next week - I have a crash in the field to diagnose today.

evanmiller commented 7 years ago

Ok, sounds good to me. My only ambition in the meantime is to try to get the single-precision accuracy down to 1 arc-second for all the routines; looks like everything's hitting that target except Mr. Transverse Mercator...

evanmiller / ProjCL

Support for double-precision calculations #13