llama: significance of truncating input to `ctxLen - 4`

iboB commented 3 months ago

Why truncate to ctxLen - 4? Why is that 4 significant.

This is kept for now as per llama.cpp demos, but we should investigate.

pminev commented 2 months ago

I was looking at the commits and PRs:

Finally I talked with G.Gerganov and it was added to secure space (at least 4) in KV cache for the new generated tokens. I didn't ask him further why it's explicitly 4, but it seems like when new tokens are generated the input will be truncated, in order to have enough space again.

iboB commented 1 month ago

this can be closed now. There is a link to this issue in the code for reference

alpaca-core / ac-local

llama: significance of truncating input to `ctxLen - 4` #16