Remove some Gkyl namespace stuff, enable Cuda in g2, and add NCCL wrapper

In preparation for parallelizing species across GPUs (and also decomposing space across multiple GPUs), we

Enable Cuda in g2.
Add a wrapper for NCCL (similar to our MPI wrapper) used for inter-GPU communication.
Remove a bunch of the Gkyl namespace stuff we had implemented the 1st year we attempted GPU implementation (like those huge templates).

The first was needed to interface with NCCL. The latter made build/compilation easier or possible.

Note to myself and maybe others: I think the GC could be invoked in more places where we wrap MPI and CUDA objects to be a little safer. See how NCCL communicators are created in Comm/Nccl.lua.

ammarhakim / gkyl

Remove some Gkyl namespace stuff, enable Cuda in g2, and add NCCL wrapper #104