SantaCoder works but never seems to generate <|end|>

Heres what happens when I follow the instructions in the README:

$ ./main -m ./gpt_bigcode-santacoder-ggml-q4_1.bin -p "def fibonnaci(" --top_k 1 --temp 0.2 --top_p 0.95 --seed 1683881276
main: seed = 1683881276
starcoder_model_load: loading model from './gpt_bigcode-santacoder-ggml-q4_1.bin'
starcoder_model_load: n_vocab = 49280
starcoder_model_load: n_ctx   = 2048
starcoder_model_load: n_embd  = 2048
starcoder_model_load: n_head  = 16
starcoder_model_load: n_layer = 24
starcoder_model_load: ftype   = 1003
starcoder_model_load: qntvr   = 1
starcoder_model_load: ggml ctx size = 1794.97 MB
starcoder_model_load: memory size =   768.00 MB, n_mem = 49152
starcoder_model_load: model size  =  1026.83 MB
main: prompt: 'def fibonnaci('
main: number of tokens in prompt = 7, first 8 tokens: 563 24240 78 2658 64 2819 7

def fibonnaci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibonacci(n - 1) + fibonacci(n - 2)

print(fibo(10))
print(fibo(100))
print(fibo(1000))
print(fibo(10000))
print(fibo(100000))
print(fibo(1000000))
print(fibo(10000000))
print(fibo(100000000))
print(fibo(1000000000))
print(fibo(10000000000))
print(fibo(100000000000

main: mem per token =   314360 bytes
main:     load time =   404.02 ms
main:   sample time =    72.67 ms
main:  predict time = 10409.67 ms / 50.53 ms per token
main:    total time = 10991.78 ms

I have the same issue with example/starcoder from the ggml repo, it just keeps on generating and wont stop until it hits the generation limit. Running the original model through transformers with the same prompt produces the same input token ids, but the outputs frustratingly diverge.

bigcode-project / starcoder.cpp

SantaCoder works but never seems to generate <|end|> #25