alpaca-core / ac-local

Alpaca Core local inference SDK
MIT License
1 stars 0 forks source link

llama: significance of truncating input to `ctxLen - 4` #16

Closed iboB closed 1 month ago

iboB commented 3 months ago

Why truncate to ctxLen - 4? Why is that 4 significant.

This is kept for now as per llama.cpp demos, but we should investigate.

pminev commented 2 months ago

I was looking at the commits and PRs:

Finally I talked with G.Gerganov and it was added to secure space (at least 4) in KV cache for the new generated tokens. I didn't ask him further why it's explicitly 4, but it seems like when new tokens are generated the input will be truncated, in order to have enough space again.

iboB commented 1 month ago

this can be closed now. There is a link to this issue in the code for reference