ggerganov / llama.cpp

LLM inference in C/C++
MIT License
64.96k stars 9.31k forks source link

Add a new `llama_load_model_from_buffer()` method to compliment `llama_load_model_from_file()` #6311

Open asg017 opened 5 months ago

asg017 commented 5 months ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

Feature Description

There should be a llama_load_model_from_buffer() function added to llama.h/llama.cpp to compliment llama_load_model_from_file(). Instead of loading a model from a file, it should instead read the model from a user-provided buffer.

Motivation

I'm working on a tool that can load multiple llama models from different sources. Ideally, I'd like to store these models in a SQLite database, and load them in full from memory. However, since the only way to load llama models is with llama_load_model_from_file(), I'll need to serialize them to disk first and pass in a path to that file. That's pretty wasteful, as they are already in memory and don't need to persist them to disk.

In my case, I'm working with small embedding models (10's to 100's of MB), but I'm sure this can be useful for larger models on larger computers.

Possible Implementation

Hmm looks like gguf_init_from_buffer() has ben commented out from ggml.h. So maybe this will be more difficult than I thought?

ggerganov commented 5 months ago

Can we implement this somehow using fmemopen() and read the memory buffer as if it were a file?

slaren commented 5 months ago

I don't think fmemopen is supported on windows unfortunately.

github-actions[bot] commented 4 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

cyanic-selkie commented 1 month ago

Is there any chance of this happening?