abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.64k stars 918 forks source link

Pull from Ollama repo functionality #1607

Open ericcurtin opened 1 month ago

ericcurtin commented 1 month ago

Is your feature request related to a problem? Please describe. Ollama has an easy to use repo, you can pull via short and simple strings.

Describe the solution you'd like Here is a script that can pull from Ollama repo:

# To run the relevant tests use
# go test -tags=integration ./server
set -e
set -o pipefail

export OLLAMA_MODELS=test_data/models
REGISTRY_SCHEME=https
REGISTRY=registry.ollama.ai
TEST_MODELS=("library/orca-mini:latest" "library/llava:7b")
ACCEPT_HEADER="Accept: application/vnd.docker.distribution.manifest.v2+json"

for model in ${TEST_MODELS[@]}; do
    TEST_MODEL=$(echo ${model} | cut -f1 -d:)
    TEST_MODEL_TAG=$(echo ${model} | cut -f2 -d:)
    mkdir -p ${OLLAMA_MODELS}/manifests/${REGISTRY}/${TEST_MODEL}/
    mkdir -p ${OLLAMA_MODELS}/blobs/

    echo "Pulling manifest for ${TEST_MODEL}:${TEST_MODEL_TAG}"
    curl -s --header "${ACCEPT_HEADER}" \
        -o ${OLLAMA_MODELS}/manifests/${REGISTRY}/${TEST_MODEL}/${TEST_MODEL_TAG} \
        ${REGISTRY_SCHEME}://${REGISTRY}/v2/${TEST_MODEL}/manifests/${TEST_MODEL_TAG}

    CFG_HASH=$(cat ${OLLAMA_MODELS}/manifests/${REGISTRY}/${TEST_MODEL}/${TEST_MODEL_TAG} | jq -r ".config.digest")
    echo "Pulling config blob ${CFG_HASH}"
    curl -L -C - --header "${ACCEPT_HEADER}" \
        -o ${OLLAMA_MODELS}/blobs/${CFG_HASH} \
        ${REGISTRY_SCHEME}://${REGISTRY}/v2/${TEST_MODEL}/blobs/${CFG_HASH}

    cat ${OLLAMA_MODELS}/manifests/${REGISTRY}/${TEST_MODEL}/${TEST_MODEL_TAG}
    for LAYER in $(cat ${OLLAMA_MODELS}/manifests/${REGISTRY}/${TEST_MODEL}/${TEST_MODEL_TAG} | jq -r ".layers[].digest"); do
        echo "Pulling blob ${LAYER}"
        curl -L -C - --header "${ACCEPT_HEADER}" \
            -o ${OLLAMA_MODELS}/blobs/${LAYER} \
            ${REGISTRY_SCHEME}://${REGISTRY}/v2/${TEST_MODEL}/blobs/${LAYER}
    done
done

Integrate something similar and make it useable

Describe alternatives you've considered

Additional context

ericcurtin commented 1 month ago

From llamafile:

When you download a new model with ollama, all its metadata will be stored in a manifest file under ~/.ollama/models/manifests/registry.ollama.ai/library/. The directory and manifest file name are the model name as returned by ollama list. For instance, for llama3:latest the manifest file will be named .ollama/models/manifests/registry.ollama.ai/library/llama3/latest.

The manifest maps each file related to the model (e.g. GGUF weights, license, prompt template, etc) to a sha256 digest. The digest corresponding to the element whose mediaType is application/vnd.ollama.image.model is the one referring to the model's GGUF file.

Each sha256 digest is also used as a filename in the ~/.ollama/models/blobs directory (if you look into that directory you'll see only those sha256-* filenames). This means you can directly run llamafile by passing the sha256 digest as the model filename. So if e.g. the llama3:latest GGUF file digest is sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29, you can run llamafile as follows:

randoentity commented 1 month ago

Can you explain what the advantages are over cloning from HF directly? If you don't want to deal with git lfs you could also use aria2c or https://github.com/oobabooga/text-generation-webui/blob/main/download-model.py These last two both feature resuming downloads and verification.

ericcurtin commented 1 month ago

I think for simple usages of LLMs it's ideal... Like:

ollama pull mistral

is one of the reasons, ollama is so popular. It doesn't get easier than typing one word right? And the popularity of the ollama registry speaks for itself.

I think supporting both huggingface and Ollama is most ideal.

ericcurtin commented 1 month ago

I noticed today, local-ai has it:

https://github.com/mudler/LocalAI/pull/2628