Closed LoganDark closed 1 year ago
Hey @saharNooby macos is failing again for another reason that isn't my fault, I'm starting to think github is just cursed
Finally I realized how to push into PRs... It turns out I was trying to push into your master
, which obviously should not work. Pushing into clblast
works.
I'll try various hacks here to get it MacOS build.
Well, that seems to have fixed it.
I think the biggest problem we have right now is that we don't seem to be able to test these libraries on CI or offer them in GitHub releases. We should probably try to do something about that.
we don't seem to be able to test these libraries on CI or offer them in GitHub releases. We should probably try to do something about that.
I'm not sure I understand. You talking about cuBLAS and CLBlast?
OMG LOL IT FIXED THAT ISSUE FOR WHICH SANITIZER WAS ENABLED
we don't seem to be able to test these libraries on CI or offer them in GitHub releases. We should probably try to do something about that.
I'm not sure I understand. You talking about cuBLAS and CLBlast?
Yes, currently people can't get prebuilt binaries for either of those features, and they aren't tested in CI.
OMG LOL IT FIXED THAT ISSUE FOR WHICH SANITIZER WAS ENABLED
LOL
llama.cpp
builds and provides binaries for cuBLAS and CLBLast: releases, build file
I'll add it into my backlog, seems easy enough to do.
I would really prefer to have CLBlast build documented. PR desc looks good enough, maybe format it a little and put it into docs/CLBlast_on_Windows.md
. It would be similar to docs/cuBLAS_on_Windows.md
.
But I will not block this PR because of this, I can write the doc later myself.
The PR's currently blocked anyway because I have only tested the small world models with the little sequence.c and confirmed the logits output is identical, but I have not tested any other models (in particular the larger raven models) and that probably needs to work before we merge this. I have no reason to believe that it doesn't but need to make sure
@Mathmagician8191 has done some testing with this i think and i'm not really capable of writing documentation on this right now (on account of dissociative identity disorder hehe) but the code seems functional at least
Most of the work was getting CMake to find it. Just enable
RWKV_CLBLAST
and then drop the OpenCL & CLBlast distributions into the repository root like so:the actual folders after unzipping, of course!!
Marked as draft due to lack of testing—I unfortunately lost my bespoke chat script at some point and so can't really do my own experimentation immediately, but I do want to put this out there and have it available for others to see and test out for themselves.
Performance seems to be almost exactly on-par with CUDA in my experience. So maybe this will be getting CUDA-like performance out of Intel and AMD GPUs - exciting :D
It took me about 2 hours and 30 minutes of real time to complete this pull request :)