JuliaAttic / CUBLAS.jl

Julia interface to CUBLAS
Other
26 stars 19 forks source link

RFC: Batched CUBLAS functions #7

Closed kshyatt closed 9 years ago

kshyatt commented 9 years ago

I have implemented the wrappers and written passing tests for all of the batched CUDA 6.5 CUBLAS functions. I was branching out on my own here a bit so this PR is RFC because I want to make sure it's consistent with the current coding style. I'm happy to do cleanups/add more tests.

In particular, this PR adds:

nwh commented 9 years ago

GitHub says that there are conflicts. I just added you as a collaborator in this repo. I have not had time to work on this recently.

kshyatt commented 9 years ago

Wow! Thanks! I'll take a look later today. Does the general approach I took (e.g. in gemmBatched) look ok?

nwh commented 9 years ago

Yes, I think so. (However, I am not familiar with what gemmBatched does). Thanks for contributing!

kshyatt commented 9 years ago

The batched functions just take a sets of small matrices and run the function (matrix multiplication in this case) on all of them at once. It's only good for small matrices because for them kernel launch overhead is a big part of the runtime. It's kind of similar to pmap for CUDA.

kshyatt commented 9 years ago

Ok, I did something very stupid in my git repo. I will close this PR, make a new branch, fix everything up, and resubmit.

kshyatt commented 9 years ago

Or, rather, I will fix and force-push to this branch. Sorry for comment spam.

nwh commented 9 years ago

No worries!

kshyatt commented 9 years ago

Hmm, even looking at this after acting as part of JuliaGPU I can't merge the PR. Does it look ok? Would you mind merging it?