Closed kshyatt closed 9 years ago
GitHub says that there are conflicts. I just added you as a collaborator in this repo. I have not had time to work on this recently.
Wow! Thanks! I'll take a look later today. Does the general approach I took (e.g. in gemmBatched
) look ok?
Yes, I think so. (However, I am not familiar with what gemmBatched
does). Thanks for contributing!
The batched functions just take a sets of small matrices and run the function (matrix multiplication in this case) on all of them at once. It's only good for small matrices because for them kernel launch overhead is a big part of the runtime. It's kind of similar to pmap
for CUDA.
Ok, I did something very stupid in my git repo. I will close this PR, make a new branch, fix everything up, and resubmit.
Or, rather, I will fix and force-push to this branch. Sorry for comment spam.
No worries!
Hmm, even looking at this after acting as part of JuliaGPU I can't merge the PR. Does it look ok? Would you mind merging it?
I have implemented the wrappers and written passing tests for all of the batched CUDA 6.5 CUBLAS functions. I was branching out on my own here a bit so this PR is RFC because I want to make sure it's consistent with the current coding style. I'm happy to do cleanups/add more tests.
In particular, this PR adds:
gemmBatched
trsmBatched
getrfBatched
getriBatched
matinvBatched
geqrfBatched
gelsBatched