Open MichaelHirn opened 8 years ago
Quite some time has passed since the issue was opened and a significant (usable) part has been implemented.
There are some benefits to having a seperate wrapper crate for CUDA, so that part should be extracted in the future.
For Cuda we can take a lot of the structure alreay implemented for OpenCL. I also think we should go with implementing the lower-level Cuda Driver API instead of the Cuda Runtime API as it gives us more flexibility and is closer to the OpenCL behavior. Both implementations could be mixed, though.
Thanks to Bindgen, the extraction of the ffi becomes a way easier.