NEW: Allow mixed precision operators

Purpose

Increase the available precision options for all of the implemented Operators. This allows users to use 64bit floating types or mix 32bit and 64bit types without causing errors to be raised.

Approach

Use CXX templates to compile the operator CUDA kernels for multiple input types. Using CuPy RawModule and runtime type introspections we can then match the cupy data types with the RawKernels compiled for those types.

The default precision of most functions is now controlled by the tike.precision module. Not sure whether this can be changed at runtime or not.

Pre-Merge Checklists

Submitter

[x] Write a helpfully descriptive pull request title.
[x] Organize changes into logically grouped commits with descriptive commit messages.
[ ] Document all new functions.
[ ] Click 'details' on the readthedocs check to view the updated docs.
[ ] Write tests for new functions or explain why they are not needed.
[ ] Address any complaints from pep8speaks.

Reviewer

[ ] Actually read all of the code.
[ ] Run the new code yourself; the included tests should make this easy.
[ ] Write a summary of the changes as you understand them.
[ ] Thank the submitter.

AdvancedPhotonSource / tike