abi / secret-llama

Fully private LLM chatbot that runs entirely with a browser with no server needed. Supports Mistral and LLama 3.
https://secretllama.com
Apache License 2.0
2.33k stars 130 forks source link

Improvement: Add a Local Model Manager to application #9

Open rmusser01 opened 2 months ago

rmusser01 commented 2 months ago

As a user, I'd like to be able to use the application, and have it load a model I have already downloaded previously.

bakkot commented 2 months ago

Upstream issue: https://github.com/mlc-ai/web-llm/issues/282

abi commented 2 months ago

Might already be possible with web-llm. Documention isn't clear but this example seems to allow model uploads: https://github.com/mlc-ai/web-llm/tree/main/examples/simple-chat-upload

bakkot commented 2 months ago

That example "uploads" the model to IndexDB, which means it creates a copy of the whole thing on disk instead of merely reading it into memory. For large models that's pretty expensive.

abi commented 2 months ago

Ah I see. While it would be ideal to not duplicate the storage, I think not having to download from the internet is still a win. Happy to support both options here.

abi commented 1 month ago

From @youhogeon in #18,

Our company uses a closed network. All files from external sources must be imported via USB(or an equivalent method).

So, first, I download the wasm file and parameters of model, import them into a closed network, and then temporarily modify the App.tsx file as follows.

However, I hope it will help you set up your model in a better way.

Thank you again for releasing your great code as open source.

const appConfig = webllm.prebuiltAppConfig; appConfig.model_list = [ { "model_url": "/models/Llama-3-8B-Instruct-q4f16_1-MLC/", "model_id": "Llama-3-8B-Instruct-q4f16_1", "model_lib_url": '/models/Llama-3-8B-Instruct-q4f16_1-ctx4k_cs1k-webgpu.wasm', "vram_required_MB": 4598.34, "low_resource_required": true, }, ] appConfig.useIndexedDBCache = true;