There are sort of two approaches: run Vicuna locally, or host it in the cloud.
Running locally is fine (and easier) but forces me to manually run daily crawls/updates on my laptop. I'd really like updates to run in the cloud, preferably when I'm sleeping. :-)
So, at some point, I'll need to host Vicuna behind an API in the cloud.
Alas, the Internet has not been cooperative and completely solved this problem for me just yet — stuff is moving way too fast — so some exploration and head-scratching proved necessary. Self-hosted LLMs really are the bleeding edge.
After my exploration, I've concluded that building on top of @ggerganov's llama.cpp is the way to go. In particular:
[x] Use IPFS (aka bittorrent for cool kids) to grab 7B-4bit and 13B-4bit model weights and massage them into ggml format. Yet another new ML format. Joy.
[x] Build a new library using the python native bindings to llama.cpp that makes it easy to run zero-shot prompts through the model
Okay, now I've got Vicuna running in the cloud. Just a few more steps to put it all together:
[x] Implement a new LangChain LLM subclass by overriding _call(...) that can invoke my API endpoint
[x] Update our service to use this LLM. In particular, we'll want summarize_vicuna7b_langchain(...) and summarize_vicuna13b_langchain(...) equivalents to summarize_openai_langchain(...)
There are sort of two approaches: run Vicuna locally, or host it in the cloud.
Running locally is fine (and easier) but forces me to manually run daily crawls/updates on my laptop. I'd really like updates to run in the cloud, preferably when I'm sleeping. :-)
So, at some point, I'll need to host Vicuna behind an API in the cloud.
Alas, the Internet has not been cooperative and completely solved this problem for me just yet — stuff is moving way too fast — so some exploration and head-scratching proved necessary. Self-hosted LLMs really are the bleeding edge.
After my exploration, I've concluded that building on top of @ggerganov's llama.cpp is the way to go. In particular:
Okay, now I've got Vicuna running in the cloud. Just a few more steps to put it all together:
_call(...)
that can invoke my API endpointsummarize_vicuna7b_langchain(...)
andsummarize_vicuna13b_langchain(...)
equivalents tosummarize_openai_langchain(...)