Open ggerganov opened 4 weeks ago
libllama
(which was removed in #9294) via a new llama_perf
API. The llama_context_params
is extended with a new bool no_perf
parameter that can be used to disable the internal timings during libllama
compute.
Overview
This is a list of changes to the public interface of the
llama
library. Collaborators are encouraged to edit this post in order to reflect important changes to the API that end up merged into themaster
branch.If you are building a 3rd party project that relies on
libllama
, it is recommended to follow this issue and check it before upgrading to new versions.See also:
llama-server
REST APIRecent API changes (most recent at the top)
LLAMA_POOLING_TYPE_RANK
llama_n_head()
llama_perf
API + param to disable internal profilingllama_sampler_chain_remove()
LLAMA_VOCAB_TYPE_RWKV
enum valuellama_threadpool
API + changeuint32_t
->int32_t
llama_model_is_recurrent
For older changes, use:
Upcoming API changes