Open evanmiller opened 7 years ago
In kernels, nothing is as simple as it seems. Given the syntactic quirks (0.5f - ouch) and semantic differences, having separate float/double kernels seems mandatory.
All I/O in double precision would certainly meet my needs, as long as it doesn't overburden yours.
OK then — in that case I'll start laying some groundwork in the internal API, so you'll be productive when you get a chance to tackle double-precision properly. (Nothing too drastic... if your priorities change it won't hurt my feelings :-)
By the way, the 0.5
/0.5f
dilemma can be avoided with -cl-single-precision-constant
(under "Math Intrinsics Options"):
https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clBuildProgram.html
That at least opens the door to unified kernels, though I still think separate ones will be preferable.
We're still left with:
Separating the kernels still seems right for now.
I am likely to start on the lat/lon extension next week - I have a crash in the field to diagnose today.
Ok, sounds good to me. My only ambition in the meantime is to try to get the single-precision accuracy down to 1 arc-second for all the routines; looks like everything's hitting that target except Mr. Transverse Mercator...
(Following up on the discussion in #4, cc @BobBane)
I avoided double-precision in the past because Apple's OpenCL compiler was very buggy with doubles (crashing etc). The compiler quality has improved considerably since 2011, and my other OpenCL project (not open source, and not geo-related) uses double-precision exclusively now.
I see a few paths forward here:
Migrate everything to double-precision
Maintain separate kernels for single-precision and double-precision
Use typedefs / #defines with a single set of kernels
I'm hesitant to adopt the
#define KFLOAT float
approach because the algorithms themselves may need to differ between single and double-precision. I.e. I use a lot of tricks to avoid round-off in the single-precision world that wouldn't be necessary with double-precision. Then there's the various tolerance levels and iteration counts that will need to differ between the two precisions, as well as the annoyance that literals must have an "f" specifier in the single-precision world (i.e.0.5
has to be written0.5f
).The only real reason to maintain single-precision is for applications that (strongly) prefer speed over accuracy, or to support GPUs that lack double-precision support. With Magic Maps, the slowdown might be an acceptable trade-off, but I won't really know until I try it out. So I'm hesitant to rip out the single-precision code willy-nilly.
For now I'm leaning toward a two-kernel world, implementing (porting) double-precision versions of projections as needed. I imagine an extra argument to
pl_context_init
would specify the desired computation precision — which would later be passed topl_find_kernel
— and I think that a wrapper function (or several) aroundclSetKernelArg
should mean we won't have to duplicate many host-side functions. I'm envisioning two separate folderskernel/float/
andkernel/double/
, with all OpenCL functions prefixed with eitherplf_
orpld_
depending on the precision. Generally only one set of kernels would be compiled for a givenPLContext
.To keep a unified C API, I am fine with requiring all input/output buffers to be double-precision; for my application, copying matters much less than raw computation speed, so I'm okay with sending in and reading out only
double
anddouble *
. So the user would only need to think about the precision choice when initializing the context, and thus could easily switch between computation precisions without having to rework all the client code.What do you think?