Open Trawczynski opened 11 months ago
seems related to https://github.com/abetlen/llama-cpp-python/issues/888
To temporarily solve this issue, you can use the following approach.
# @Time : 2023/11/9 16:49
# @Author : baii
# @File : example
# @Use :
from typing import Sequence
from llama_cpp import Llama as MyLlama, llama_cpp
class Llama(MyLlama):
def eval(self, tokens: Sequence[int]):
"""Evaluate a list of tokens.
Args:
tokens: The list of tokens to evaluate.
"""
assert self.ctx is not None
n_ctx = self._n_ctx
for i in range(0, len(tokens), self.n_batch):
batch = tokens[i: min(len(tokens), i + self.n_batch)]
n_past = min(n_ctx - len(batch), len(self._input_ids))
n_tokens = len(batch)
return_code = llama_cpp.llama_eval(
ctx=self.ctx,
tokens=(llama_cpp.llama_token * len(batch))(*batch),
n_tokens=n_tokens,
n_past=n_past,
)
if return_code != 0:
raise RuntimeError(f"llama_eval returned {return_code}")
# Save tokens
self.input_ids[self.n_tokens: self.n_tokens + n_tokens] = batch
# Save logits
rows = n_tokens if self.context_params.logits_all else 1
cols = self._n_vocab
offset = (
0 if self.context_params.logits_all else n_tokens - 1
) # NOTE: Only save the last token logits if logits_all is False
self.scores[self.n_tokens + offset: self.n_tokens + n_tokens, :].reshape(
-1
)[:] = llama_cpp.llama_get_logits(self.ctx)[: rows * cols]
# Update n_tokens
self.n_tokens += n_tokens
To temporarily replace llama_decode with llama_eval, you can use the following approach. i test it work good on codellama-7b
should be fixed now, let us know
I've also been trying to generate deterministic responses with
temperature>0
by setting the random seed to a constant number (seed
parameter), but it didn't work in version0.2.14
.
Setting temperature to zero implies division by zero [source], so it should not be supported here. I think setting seeds (all of them) to a non-negative number is the right approach towards getting deterministic responses from these models.
As far as I know, setting temperature to zero is a common way of asking for greedy logits evaluation, and is supported in many providers like OpenAI or Anthropic - the docs state that zero is a valid temperature. Llama.cpp also supports zero temperature.
Results are often non-deterministic even with zero temperature for other reasons, like CUDA being non-deterministic for optimization.
Problem description
Hi, I have been doing some basic testing in a notebook after finding some strange behavior in my code. Basically two things happen when running a model with
temperature=0
for versions>0.2.14
:▅
).Examples
It's easier to understand with examples, so I'll upload a couple of screenshots.
Code
GitHub won't let me upload the notebook, so I'll just paste its cells:
Notes
n_gpu_layers
, but it didn't change the results.!CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
.Seed?
I've also been trying to generate deterministic responses with
temperature>0
by setting the random seed to a constant number (seed
parameter), but it didn't work in version0.2.14
.