Closed amqdn closed 2 weeks ago
The KV cache buffer zeroing is necessary. Without it, certain operations that rely on padded tensor data could access uninitialized memory which could result in nans
.
IMO user applications should have some awareness of the context size that they require and allocate only the necessary amount. The llama.cpp
examples use the full training context by default, but this is not required and user code can choose any context size that makes sense for the app.
I see. So, then, shall we leave it at: User applications are responsible for selecting an appropriate context size?
Yes, I think so. At least I can't think of a better solution.
Understood. I will close this issue. Thank you.
What happened?
Hi,
You may already know about the memory spike, given #7474.
For those unfamiliar,
ggml_backend_cpu_buffer_clear
callsmemset
, which initializes the allocated buffer (as big as 16 GiB for full context on Llama 3.1) to0
s, spiking memory and, on Android, leading to a system crash --adb shell
, Android hangs and rebootsAs far as I can tell, there are no guards for when
llama.cpp
might over-allocate and over-initialize memory — this may be intended, but it seems to defeat the purpose ofmmap
.Please share your perspective on this behavior; I understand it to be undefined. With limited experience, I see a number of potential solutions:
ggml_backend_buffer_clear
truly optionalggml_backend_cpu_buffer_memset_tensor
in thealloc_tensor_range
loop instead to avoid bulk initialization, perhaps as part ofggml_tallocr_alloc
or in a separate function-c
in certain environmentsTo reproduce this behavior, build for Android and run
llama-cli
orllama-simple
orllama-server
with any quantization of Llama 3.1; the default behavior ofllama.cpp
without-c
is to obtain the context from the model itself, which will load the full context in this case.I would be happy to implement a fix, whatever is decided. If instead downstream applications should manage this themselves, please clarify.
Thank you.
Name and Version
Termux
adb shell
What operating system are you seeing the problem on?
Other? (Please let us know in description)
Relevant log output
No response