"unable to load model" in stress test with Apple Silicon

For context, my system configuration is as follows: MacBook Air M1 with 16GB of RAM, running macOS 13.3. I am using Python 3.10.9, GNU Make 3.81, and g++ with Apple Clang version 14.0.3. With lastest code from master branch.

Initially, I encountered the "unable to load model" error sporadically while testing the models. It dawned on me that developers and intermittent users might not stumble upon this issue due to the caching mechanism in Mac.

Consequently, I penned a bash shell script to facilitate the reproduction of this problem for you. The script will terminate upon encountering the error, notifying you of the iteration at which the error transpired.

#!/bin/bash

count=0

for i in {1..100}; do
    output1=$(./main -m MODEL_1.bin -ngl 1 -p "hello" -n 128 2>&1)
    echo "$output1"
    count=$((count + 1))

    if [[ $output1 == *"unable to load model"* ]]; then
        echo "Encountered 'unable to load model' at iteration $count"
        exit 1
    fi

    output2=$(./main -m MODEL_2.bin -ngl 1 -p "hello" -n 128 2>&1)
    echo "$output2"
    count=$((count + 1))

    if [[ $output2 == *"unable to load model"* ]]; then
        echo "Encountered 'unable to load model' at iteration $count"
        exit 1
    fi
done

Please replace MODEL_1.bin and MODEL_2.bin with the respective names of your models. It is imperative that these are distinct files and that the combined size of MODEL_1 and MODEL_2 exceeds your system's physical memory. This stipulation is crucial because MacOS has an efficient caching system. Should the aggregate size of Model1 and Model2 be less than your physical memory, they might simply be cached within the RAM, rendering the problem unobservable.

IMPORTANT: model1 != model2 and model1 + model2 > ram size

My error output:

llama_init_from_file: failed to add buffer
llama_init_from_gpt_params: error: failed to load model '../models/ggml-guanaco-13B.ggmlv3.q4_0.bin'
main: error: unable to load model
Encountered 'unable to load model' at iteration 22

ggerganov / llama.cpp

"unable to load model" in stress test with Apple Silicon #1810