Open dality17 opened 1 year ago
Just enter anything in the box
Just enter anything in the box
Could you try it for yourself if it worked for you withouth the need for api.?
Yes, it worked for me.
Yes, it worked for me.
How? Please teach me step by step how you do it. Because im stil a beginner at python
within config.yaml:
# For locally hosted LLMs comment out the next line and uncomment the one after
# to configure a local llm point your browser to 127.0.0.1:7860 and click on the model tab in text generation web ui.
OPENAI_API_BASE: https://api.openai.com/v1
#OPENAI_API_BASE: "http://super__tgwui:5001/v1"
comment out first or replace URL with the URL of your locally hosted LLM.
Yes, it worked for me.
How? Please teach me step by step how you do it. Because im stil a beginner at python
config.yaml or gear on top right of web UI
within config.yaml:
# For locally hosted LLMs comment out the next line and uncomment the one after # to configure a local llm point your browser to 127.0.0.1:7860 and click on the model tab in text generation web ui. OPENAI_API_BASE: https://api.openai.com/v1 #OPENAI_API_BASE: "http://super__tgwui:5001/v1"
comment out first or replace URL with the URL of your locally hosted LLM.
Yes, it worked for me.
How? Please teach me step by step how you do it. Because im stil a beginner at python
config.yaml or gear on top right of web UI
Thank you so much! I had a guestion is this also possible with quivr?
Hi @dality17 @neph1 @theSkele I’m the maintainer of LiteLLM - we allow you to create a proxy server to call 100+ LLMs, and I think it can solve your problem (I'd love your feedback if it does not)
Try it here: https://docs.litellm.ai/docs/proxy_server
import openai
openai.api_base = "http://0.0.0.0:8000/" # proxy url
print(openai.ChatCompletion.create(model="test", messages=[{"role":"user", "content":"Hey!"}]))
Ollama models
$ litellm --model ollama/llama2 --api_base http://localhost:11434
Hugging Face Models
$ export HUGGINGFACE_API_KEY=my-api-key #[OPTIONAL]
$ litellm --model claude-instant-1
Anthropic
$ export ANTHROPIC_API_KEY=my-api-key
$ litellm --model claude-instant-1
Palm
$ export PALM_API_KEY=my-palm-key
$ litellm --model palm/chat-bison
It's out guys, you can go and try it out ! I made some changes to the docker file and docker compose files, this way I can run it with gpu acceleration It's working perfectly well
version: '3.8' services: backend: deploy: resources: reservations: devices:
driver: nvidia device_ids: ['0'] capabilities: [gpu] volumes:
super__postgres: image: "docker.io/library/postgres:16" environment:
proxy: image: nginx:stable-alpine ports:
networks: super_network: driver: bridge volumes: superagi_postgres_data: redis_data:
FROM nvidia/cuda:12.1.0-devel-ubuntu22.04 AS compile-image WORKDIR /app
RUN apt-get update && apt-get install --no-install-recommends -y \ git vim build-essential python3-dev python3-venv python3-pip
RUN apt-get update && \ apt-get install --no-install-recommends -y wget libpq-dev gcc g++ && \ apt-get clean && \ rm -rf /var/lib/apt/lists/*
RUN python3 -m venv /opt/venv ENV PATH="/opt/venv/bin:$PATH"
COPY requirements.txt . RUN pip3 install --upgrade pip && \ pip3 install --upgrade pip setuptools wheel ninja RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
RUN pip3 install -r requirements.txt
RUN python3 -m nltk.downloader averaged_perceptron_tagger punkt
COPY . .
RUN chmod +x ./entrypoint.sh ./wait-for-it.sh ./install_tool_dependencies.sh ./entrypoint_celery.sh
FROM nvidia/cuda:12.1.0-devel-ubuntu22.04 AS build-image WORKDIR /app
RUN apt-get update && apt-get install --no-install-recommends -y \ git vim build-essential python3-dev python3-venv python3-pip
ENV LLAMA_CUBLAS=1
RUN apt-get update && \ apt-get install --no-install-recommends -y libpq-dev && \ apt-get clean && \ rm -rf /var/lib/apt/lists/*
COPY --from=compile-image /opt/venv /opt/venv COPY --from=compile-image /app /app COPY --from=compile-image /root/nltk_data /root/nltk_data
ENV PATH="/opt/venv/bin:$PATH"
EXPOSE 8001
Cool but what if I need to run on Apple Silicon utilizing its GPUs? Do you have an example docker compose file?
currently we only support nvidia-gpus
fyi, adding a local LLM behind LM studio is proving to be quite difficult. Are we supposed to docker compose -f local-llm up -d
to start up that container?
that docker file is broken currently, but can be fixed with:
diff --git a/local-llm b/local-llm
index c30f2330..5e36c872 100644
--- a/local-llm
+++ b/local-llm
@@ -26,7 +26,7 @@ services:
- super__postgres
networks:
- super_network
-
+
gui:
build: ./gui
ports:
@@ -42,8 +42,8 @@ services:
super__tgwui:
build:
- context: .
- dockerfile: ./tgwui/DockerfileTGWUI
+ context: ./tgwui
+ dockerfile: ./DockerfileTGWUI
container_name: super__tgwui
environment:
- EXTRA_LAUNCH_ARGS="--listen --verbose --extensions openai --threads 4 --n_ctx 1600"
however, i'm on mac m1 and am getting failures when building the container (likely due to nvidia-gpu rrequirements).
I tried modifying some of the code in gui/pages/Content/Models/ModelForm.js
and elsewhere to just force superAGI to use localhost:1234/v1
but there's a lot of checking for api_keys and different things that it would be nice to have an override for. LM studio can "fake" as an openai endpoint pretty easy, so if we can just get the API call for inference to the lm studio hosted model, that should resolve most issues
i dont have api key because it cost money. i want to depend on local model instead of api key