ggerganov / llama.cpp

LLM inference in C/C++
MIT License
65.7k stars 9.43k forks source link

changelog : `libllama` API #9289

Open ggerganov opened 4 weeks ago

ggerganov commented 4 weeks ago

Overview

This is a list of changes to the public interface of the llama library. Collaborators are encouraged to edit this post in order to reflect important changes to the API that end up merged into the master branch.

If you are building a 3rd party project that relies on libllama, it is recommended to follow this issue and check it before upgrading to new versions.

See also:

Recent API changes (most recent at the top)

version PR desc
TBD. #9510 Add LLAMA_POOLING_TYPE_RANK
b3774 #9512 Add llama_n_head()
b3750 #9355 Add llama_perf API + param to disable internal profiling
b3749 #9445 Add llama_sampler_chain_remove()
b3681 #9294 Major changes to the sampling API (see PR for more info)
b3651 #8980 Add LLAMA_VOCAB_TYPE_RWKV enum value
b3644 #8672 Add llama_threadpool API + change uint32_t -> int32_t
b3614 #8526 Add llama_model_is_recurrent

For older changes, use:

git log --oneline -p b3614 -- include/llama.h

Upcoming API changes

ggerganov commented 2 weeks ago

9355 restores the functionality for getting performance measurements from within libllama (which was removed in #9294) via a new llama_perf API. The llama_context_params is extended with a new bool no_perf parameter that can be used to disable the internal timings during libllama compute.