RWKV / rwkv.cpp

INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
MIT License
1.37k stars 90 forks source link

Update GGML #103

Closed LoganDark closed 1 year ago

LoganDark commented 1 year ago

This updates GGML to the latest version with Metal and whatever support, and improved CUDA support. A lot changed, including some fundamental operations, so we had to rework the memory estimation again (sorry!). The new one should be more readable though..

Except now enabling cuBLAS creates total nonsense output, so ggml probably broke something, or maybe we are not properly transforming every single operation we perform on any tensor that touches the GPU.

We don't have time to fix this immediately but decided to open this draft PR since we reworked the memory estimation system (again) and everything runs as long as you don't enable cuBLAS.

-Emily

saharNooby commented 1 year ago

@LoganDark Will it be much work to add operators to rwkv_future_tensor (add, mul, etc.), so that we can have unified code, that constructs the graph in terms of "future tensors"?

In Dicsord, you/Emily mentioned C++ templates, but current approach with rwkv_future_tensor does not use templates and does not look too complicated, so I wonder if we can just extend it.

(as a side note, I just realized what a stupid kind of work we do just because ggml did not separate graph building and tensor allocation... Such a simple idea, but for some reason they did not)

LoganDark commented 1 year ago

@LoganDark Will it be much work to add operators to rwkv_future_tensor (add, mul, etc.), so that we can have unified code, that constructs the graph in terms of "future tensors"?

yes that would require creating cgraph again from scratch or creating some other kind of graph data structure.

In Dicsord, you/Emily mentioned C++ templates, but current approach with rwkv_future_tensor does not use templates

it doesn't use templates exactly BECAUSE it does not do multiple things. if you wanted it to do multiple things it would have to use templates. I don't want to use templates

saharNooby commented 1 year ago

@LoganDark Hi again! Is the PR ready for review (cuBLAS working)?

LoganDark commented 1 year ago

@LoganDark Hi again! Is the PR ready for review (cuBLAS working)?

Yes it is , I thought that much was obvious when I amrked it as non draft, but now I feel kind of bad that it took me so long to see this comment

saharNooby commented 1 year ago

Yes it is , I thought that much was obvious when I amrked it as non draft

Was not obvious to me :) But I will then review non-draft PRs in the future.