can't run the starcoder-ggml.bin

breakenknife commented 1 year ago

./main -m models/bigcode/starcoder-ggml.bin -p "def fibonnaci(" --top_k 0 --top_p 0.95 --temp 0.2

main: seed = 1685609262
starcoder_model_load: loading model from 'models/bigcode/starcoder-ggml.bin'
starcoder_model_load: n_vocab = 49152
starcoder_model_load: n_ctx   = 8192
starcoder_model_load: n_embd  = 6144
starcoder_model_load: n_head  = 48
starcoder_model_load: n_layer = 40
starcoder_model_load: ftype   = 1
starcoder_model_load: qntvr   = 0
starcoder_model_load: ggml ctx size = 51276.47 MB
GGML_ASSERT: ggml.c:3874: ctx->mem_buffer != NULL
Aborted (core dumped)

hi, My machine has 38GB of memory and can execute starcoder starcoder-ggml-q4_1.bin, but cannot execute non quantified starcoder-ggml.bin. Is this because there is not enough memory?

ChaoticByte commented 1 year ago

When running inference, the whole model is loaded into memory. If your starcoder-ggml.bin file is larger than your memory, then: yes :)

NouamaneTazi commented 1 year ago

Also some devices allow offloading some memory to disk (warning: inference will be slower). So you can always try running a model, and if that fails then know that the lack of memory is the issue

bigcode-project / starcoder.cpp

can't run the starcoder-ggml.bin #16