Overview

This is a list of changes to the public interface of the llama library. Collaborators are encouraged to edit this post in order to reflect important changes to the API that end up merged into the master branch.

If you are building a 3rd party project that relies on libllama, it is recommended to follow this issue and check it before upgrading to new versions.

Recent API changes (most recent at the top)

version	PR	desc
TBD.	#9510	Add `LLAMA_POOLING_TYPE_RANK`
b3774	#9512	Add `llama_n_head()`
b3750	#9355	Add `llama_perf` API + param to disable internal profiling
b3749	#9445	Add `llama_sampler_chain_remove()`
b3681	#9294	Major changes to the sampling API (see PR for more info)
b3651	#8980	Add `LLAMA_VOCAB_TYPE_RWKV` enum value
b3644	#8672	Add `llama_threadpool` API + change `uint32_t` -> `int32_t`
b3614	#8526	Add `llama_model_is_recurrent`

For older changes, use:

git log --oneline -p b3614 -- include/llama.h

Upcoming API changes

https://github.com/ggerganov/llama.cpp/pull/9510
https://github.com/ggerganov/llama.cpp/pull/9405
Accept pointers to params https://github.com/ggerganov/llama.cpp/discussions/9172

ggerganov / llama.cpp

changelog : `libllama` API #9289

Overview

Recent API changes (most recent at the top)

Upcoming API changes

9355 restores the functionality for getting performance measurements from within `libllama` (which was removed in #9294) via a new `llama_perf` API. The `llama_context_params` is extended with a new `bool no_perf` parameter that can be used to disable the internal timings during `libllama` compute.

ggerganov / llama.cpp

changelog : `libllama` API #9289

Overview

Recent API changes (most recent at the top)

Upcoming API changes

9355 restores the functionality for getting performance measurements from within libllama (which was removed in #9294) via a new llama_perf API. The llama_context_params is extended with a new bool no_perf parameter that can be used to disable the internal timings during libllama compute.

9355 restores the functionality for getting performance measurements from within `libllama` (which was removed in #9294) via a new `llama_perf` API. The `llama_context_params` is extended with a new `bool no_perf` parameter that can be used to disable the internal timings during `libllama` compute.