bigcode-project / starcoder.cpp

C++ implementation for 💫StarCoder
443 stars 36 forks source link

SantaCoder works but never seems to generate <|end|> #25

Closed the-crypt-keeper closed 1 year ago

the-crypt-keeper commented 1 year ago

Heres what happens when I follow the instructions in the README:

$ ./main -m ./gpt_bigcode-santacoder-ggml-q4_1.bin -p "def fibonnaci(" --top_k 1 --temp 0.2 --top_p 0.95 --seed 1683881276
main: seed = 1683881276
starcoder_model_load: loading model from './gpt_bigcode-santacoder-ggml-q4_1.bin'
starcoder_model_load: n_vocab = 49280
starcoder_model_load: n_ctx   = 2048
starcoder_model_load: n_embd  = 2048
starcoder_model_load: n_head  = 16
starcoder_model_load: n_layer = 24
starcoder_model_load: ftype   = 1003
starcoder_model_load: qntvr   = 1
starcoder_model_load: ggml ctx size = 1794.97 MB
starcoder_model_load: memory size =   768.00 MB, n_mem = 49152
starcoder_model_load: model size  =  1026.83 MB
main: prompt: 'def fibonnaci('
main: number of tokens in prompt = 7, first 8 tokens: 563 24240 78 2658 64 2819 7

def fibonnaci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibonacci(n - 1) + fibonacci(n - 2)

print(fibo(10))
print(fibo(100))
print(fibo(1000))
print(fibo(10000))
print(fibo(100000))
print(fibo(1000000))
print(fibo(10000000))
print(fibo(100000000))
print(fibo(1000000000))
print(fibo(10000000000))
print(fibo(100000000000

main: mem per token =   314360 bytes
main:     load time =   404.02 ms
main:   sample time =    72.67 ms
main:  predict time = 10409.67 ms / 50.53 ms per token
main:    total time = 10991.78 ms

I have the same issue with example/starcoder from the ggml repo, it just keeps on generating and wont stop until it hits the generation limit. Running the original model through transformers with the same prompt produces the same input token ids, but the outputs frustratingly diverge.

the-crypt-keeper commented 1 year ago

I figured it out, there's a couple of things going on here

--top_k 0 is definitely a bad idea as it produces undefined behavior, --top_k 1 is likely what we want here

The second problem is that this model really needs a repeat_penalty to .. not repeat itself!

I've forked upstream ggml and implemented it here: https://github.com/the-crypt-keeper/ggml/tree/starcoder_repeat_penalty