JIT compilation considerations

Issue to resolve:

1) Reading the same input files on every MPI rank 2) Writing the build artifacts (kernel binaries, shared lib for udf + nek) on very MPI rank 3) Reading the build artifacts on very MPI rank 4) dlopen() required to load udf, nek and kernel in SERIAL supports only files no streams

Options:

Precompile login/compute node fist. In a second step launch actual job and copy .cache to a node-local filesystem
JIT compile using a node-local file system without any caching (increases setup time)

I cannot think of any reasonable option if a node-local filesystem doesn't exists. In this case we have to live which the fact that we're reading the same file from all ranks stressing the filesystem.

Nek5000 / nekRS

JIT compilation considerations #330