RWKV / rwkv.cpp

INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
MIT License
1.37k stars 90 forks source link

Implement basic CLBlast support #110

Closed LoganDark closed 1 year ago

LoganDark commented 1 year ago

Most of the work was getting CMake to find it. Just enable RWKV_CLBLAST and then drop the OpenCL & CLBlast distributions into the repository root like so:

image

the actual folders after unzipping, of course!!

image

Marked as draft due to lack of testing—I unfortunately lost my bespoke chat script at some point and so can't really do my own experimentation immediately, but I do want to put this out there and have it available for others to see and test out for themselves.

Performance seems to be almost exactly on-par with CUDA in my experience. So maybe this will be getting CUDA-like performance out of Intel and AMD GPUs - exciting :D

It took me about 2 hours and 30 minutes of real time to complete this pull request :)

LoganDark commented 1 year ago

Hey @saharNooby macos is failing again for another reason that isn't my fault, I'm starting to think github is just cursed

saharNooby commented 1 year ago

Finally I realized how to push into PRs... It turns out I was trying to push into your master, which obviously should not work. Pushing into clblast works.

I'll try various hacks here to get it MacOS build.

LoganDark commented 1 year ago

Well, that seems to have fixed it.

I think the biggest problem we have right now is that we don't seem to be able to test these libraries on CI or offer them in GitHub releases. We should probably try to do something about that.

saharNooby commented 1 year ago

we don't seem to be able to test these libraries on CI or offer them in GitHub releases. We should probably try to do something about that.

I'm not sure I understand. You talking about cuBLAS and CLBlast?

saharNooby commented 1 year ago

OMG LOL IT FIXED THAT ISSUE FOR WHICH SANITIZER WAS ENABLED

LoganDark commented 1 year ago

we don't seem to be able to test these libraries on CI or offer them in GitHub releases. We should probably try to do something about that.

I'm not sure I understand. You talking about cuBLAS and CLBlast?

Yes, currently people can't get prebuilt binaries for either of those features, and they aren't tested in CI.

OMG LOL IT FIXED THAT ISSUE FOR WHICH SANITIZER WAS ENABLED

LOL

saharNooby commented 1 year ago

llama.cpp builds and provides binaries for cuBLAS and CLBLast: releases, build file

I'll add it into my backlog, seems easy enough to do.

saharNooby commented 1 year ago

I would really prefer to have CLBlast build documented. PR desc looks good enough, maybe format it a little and put it into docs/CLBlast_on_Windows.md. It would be similar to docs/cuBLAS_on_Windows.md.

But I will not block this PR because of this, I can write the doc later myself.

LoganDark commented 1 year ago

The PR's currently blocked anyway because I have only tested the small world models with the little sequence.c and confirmed the logits output is identical, but I have not tested any other models (in particular the larger raven models) and that probably needs to work before we merge this. I have no reason to believe that it doesn't but need to make sure

LoganDark commented 1 year ago

@Mathmagician8191 has done some testing with this i think and i'm not really capable of writing documentation on this right now (on account of dissociative identity disorder hehe) but the code seems functional at least