Closed breakenknife closed 1 year ago
When running inference, the whole model is loaded into memory. If your starcoder-ggml.bin file is larger than your memory, then: yes :)
Also some devices allow offloading some memory to disk (warning: inference will be slower). So you can always try running a model, and if that fails then know that the lack of memory is the issue
hi, My machine has 38GB of memory and can execute starcoder starcoder-ggml-q4_1.bin, but cannot execute non quantified starcoder-ggml.bin. Is this because there is not enough memory?