Add library function to estimate size of computation graph work tensor

One of the major demons I fought while working on https://github.com/saharNooby/rwkv.cpp/pull/74 is ggml's mysterious computation graph work tensor, that is allocated the first time ggml_graph_compute is called. I was trying to perfectly estimate memory usage of the graph, so I manually counted objects and calls to ggml functions while the graph was being built. But when I had gotten the memory usage perfectly down to the last byte, ggml_graph_compute just tries to allocate a totally arbitrary amount of memory.

I didn't want to over-estimate for smaller models or especially under-estimate for larger models. It took a while to debug which tensor was the culprit (the largest mat-mul)—I hardcoded the dimensions of this tensor to estimate the upper bound for the computation graph work tensor, but this is not a great solution.

If ggml provides a library function to estimate the size of the computation graph work tensor, then instead of guessing, I can call that function and then allocate a new scratch to contain it. It's slightly less optimal than doing it during context construction, but at that point I don't have a context or a graph yet, and can't get one yet because it requires memory (go figure).

It would also be nice if I could tell ggml to allocate that work tensor early without having to actually do any graph computation.

ggerganov / ggml

Add library function to estimate size of computation graph work tensor #214