Closed LoganDark closed 1 year ago
@LoganDark Will it be much work to add operators to rwkv_future_tensor
(add
, mul
, etc.), so that we can have unified code, that constructs the graph in terms of "future tensors"?
In Dicsord, you/Emily mentioned C++ templates, but current approach with rwkv_future_tensor
does not use templates and does not look too complicated, so I wonder if we can just extend it.
(as a side note, I just realized what a stupid kind of work we do just because ggml
did not separate graph building and tensor allocation... Such a simple idea, but for some reason they did not)
@LoganDark Will it be much work to add operators to
rwkv_future_tensor
(add
,mul
, etc.), so that we can have unified code, that constructs the graph in terms of "future tensors"?
yes that would require creating cgraph again from scratch or creating some other kind of graph data structure.
In Dicsord, you/Emily mentioned C++ templates, but current approach with rwkv_future_tensor does not use templates
it doesn't use templates exactly BECAUSE it does not do multiple things. if you wanted it to do multiple things it would have to use templates. I don't want to use templates
@LoganDark Hi again! Is the PR ready for review (cuBLAS working)?
@LoganDark Hi again! Is the PR ready for review (cuBLAS working)?
Yes it is , I thought that much was obvious when I amrked it as non draft, but now I feel kind of bad that it took me so long to see this comment
Yes it is , I thought that much was obvious when I amrked it as non draft
Was not obvious to me :) But I will then review non-draft PRs in the future.
This updates GGML to the latest version with Metal and whatever support, and improved CUDA support. A lot changed, including some fundamental operations, so we had to rework the memory estimation again (sorry!). The new one should be more readable though..
Except now enabling cuBLAS creates total nonsense output, so ggml probably broke something, or maybe we are not properly transforming every single operation we perform on any tensor that touches the GPU.
We don't have time to fix this immediately but decided to open this draft PR since we reworked the memory estimation system (again) and everything runs as long as you don't enable cuBLAS.
-Emily