Using basaran for loading custom Hugging Face models

handrew commented 1 year ago

Just adding my notes here on how to run basaran as a local backend to Window. Happy to add the below to wherever appropriate in the repo.

Their repo makes it pretty easy to set up with just a python virtualenv. The process for me was as simple as:

Process

Create a virtualenv with virtualenv -p python3 basaran_env and activate it source basaran_env/bin/activate.
pip install basaran
Then, running MODEL=user/repo PORT=8000 python -m basaran downloads the model located at https://huggingface.co/<user>/<repo>, (e.g., gpt2 or allenai/tk-instruct-3b-def) to the current folder and serve it via localhost:8000/v1/completions. You can confirm it works by running:

curl http://127.0.0.1/v1/completions \
    -H 'Content-Type: application/json' \
    -d '{ "prompt": "once upon a time," }'

Set the Local model endpoint in Window to http://127.0.0.1:8000/v1 (instead of the default http://127.0.0.1:8000/).

Notes

Supposedly, it also supports streaming.
It's not working with the Windowai.io home page or the ChatbotUI, which I suspect is because they expect a chat endpoint instead of the given completions endpoint, (although it does work for Jema and Hangman).

alexanderatallah commented 1 year ago

@handrew what's the error that happens on the homepage or chatbotui? This should work - basically, the llm/local.ts model converts chats to completions in the transformRequest method

handrew commented 1 year ago

Uncaught Error: MODEL_REJECTED_REQUEST: 404: AxiosError: Request failed with status code 404
Error: MODEL_REJECTED_REQUEST: 404: AxiosError: Request failed with status code 404

alexanderatallah / window.ai

Using basaran for loading custom Hugging Face models #45

Process

Notes