Open nistur opened 1 month ago
Perhaps it would be wise to embed a tiny model inside of the library its-self, and copy over the entirety of llama.cpp to allocate initial memory before it can be delegated to a web server.
EDIT: Oh no. llama.cpp has human malloc's
I am unsure how to proceed following this revelation.
We should use the tensorflow network as our swap memory to avoid recalling malloc.
We should also think about using fp32 as our front mallocAI frontend
There are several instances of
malloc
andrealloc
in mallocPlusAI.hThis is unacceptable as we cannot expect for these to be correct without AI intervention. Proper handling of these cases should be of paramount importance to ensure that this important implementation does not fall foul to memory issues caused by human fallibility.