This PR includes numerous (too many?) changes, mainly because they are somewhat interdependent. I can try to dice it up into several PRs if necessary.
Replace all instances of "RuntimeException" with either GrCUDAException or GrCUDAInternalException, so that all thrown exceptions are proper TruffleException instances.
Export the namespaces into the polyglot namespace (eval("grcuda","") will obtain the root namespace), this allows TAB completion in, e.g., Python.
Add a "mockup" mode that doesn't load CUDA libraries and that can be used for testing basic DeviceArray access functionality.
A skeleton for cuBLAS support (similar to cuML).
A function to map from an existing array to a DeviceArray (MapDeviceArrayFunction). This function uses Truffle LoopNode to allow on stack replacement, i.e., fast execution even for the first call.
Versions of DeviceArrayFunction and MapDeviceArrayFunction that are curried with the type (TypedDeviceArrayFunction, TypedMapDeviceArrayFunction).
An initial version of the argument mapping and object shredding functionality.
Some refactorings to simplify the code in CUDARuntime.
Use Truffle options system to allow for "--grcuda.Xyz" style options.
cuBLAS/cuML libraries are only loaded when the first function is called, and they are now enabled by default.
Remove usage of deprecated "parse" API in CUDARuntime.
This PR includes numerous (too many?) changes, mainly because they are somewhat interdependent. I can try to dice it up into several PRs if necessary.
I'm looking forward to comments/feedback/...!