It would be great to have support in this package for multi-GPU parallelism. I devised a very hacky sort of way to accomplish this and wrote it up here. This particular one is for the CUSPARSE package, but the implementation would be nearly identical for CUBLAS. I'm not certain if my approach was necessarily the best. I'd be happy to work a bit to get this added to the package. But, since I've never contributed to a package before and am not certain how good my approach is, it would be helpful to correspond a bit before just putting up a pull request.
How does the linked implementation look? Any comments, thoughts, suggestions? Obviously, what's posted there is just the rudiments of an implementation, just for a single function and without even all of the functionality for that.
See also this discussion of much this same issue on the Julia CUSPARSE GitHub page here.
It would be great to have support in this package for multi-GPU parallelism. I devised a very hacky sort of way to accomplish this and wrote it up here. This particular one is for the CUSPARSE package, but the implementation would be nearly identical for CUBLAS. I'm not certain if my approach was necessarily the best. I'd be happy to work a bit to get this added to the package. But, since I've never contributed to a package before and am not certain how good my approach is, it would be helpful to correspond a bit before just putting up a pull request.
How does the linked implementation look? Any comments, thoughts, suggestions? Obviously, what's posted there is just the rudiments of an implementation, just for a single function and without even all of the functionality for that.
See also this discussion of much this same issue on the Julia CUSPARSE GitHub page here.