Deploying into k8s cluster

hicotton02 commented 3 months ago

I am using this docker image to deploy into a k8s cluster. When I access the service through the LB (192.168.1.36:7860) I get redirected to http://localhost:1111/login but since I am accessing externally, that doesnt work. I have an ollama implementation and a open-webui implementation in the same k8s cluster that I am trying to connect to. What am I missing?

here is my deployment yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: automatic1111
  namespace: ollama
  labels:
    app: automatic1111
spec:
  replicas: 1
  selector:
    matchLabels:
      app: automatic1111
  template:
    metadata:
      labels:
        app: automatic1111
    spec:
      containers:
      - name: automatic1111-container
        image: ghcr.io/ai-dock/stable-diffusion-webui:latest
        ports:
        - containerPort: 7860
        resources:
          limits:
            nvidia.com/gpu: 1
            memory: "16Gi"
            cpu: "8"
          requests:
            nvidia.com/gpu: 1
            memory: "16Gi"
            cpu: "8"
        env:
        - name: SERVICEPORTAL_HOME
          value: "http://192.168.1.36:7860"
        - name: SERVICEPORTAL_LOGIN
          value: "http://192.168.1.36:7860/login"
        - name: PORT
          value: "7860"
        volumeMounts:
        - name: models-volume
          mountPath: /models
      volumes:
      - name: models-volume
        persistentVolumeClaim:
          claimName: automatic1111-models-pvc
      nodeSelector:
        ollama-gpu: "true"

I am setting the serviceportal ip manually in the deployment file, but when the pod starts:

root@master1:~# kubectl exec -it automatic1111-d4bf6556-7vlxs -n ollama -- /bin/bash
(webui) root@automatic1111-d4bf6556-7vlxs:/opt# printenv | grep SERVICEPORTAL
SERVICEPORTAL_VENV_PIP=/opt/environments/python/serviceportal/bin/pip
SERVICEPORTAL_VENV=/opt/environments/python/serviceportal
SERVICEPORTAL_VENV_PYTHON=/opt/environments/python/serviceportal/bin/python
SERVICEPORTAL_LOGIN=http://localhost:1111/login
SERVICEPORTAL_HOME=http://localhost:1111

I have also tried manually editing the caddy/share/config files to try and push the config change to no avail.

robballantyne commented 3 months ago

WEB_ENABLE_AUTH=false if it's safe to do so is the easiest method. Alternative is to coordinate WEB_TOKENs and pass as Authorisation (Bearer) headers.

This isn't a use case I'd intended for the container but would be interested in hearing about your results

hicotton02 commented 3 months ago

that worked. I am running it internally and do not need web auth. auto1111-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: automatic1111
  namespace: ollama
  labels:
    app: automatic1111
spec:
  replicas: 1
  selector:
    matchLabels:
      app: automatic1111
  template:
    metadata:
      labels:
        app: automatic1111
    spec:
      containers:
      - name: automatic1111-container
        image: ghcr.io/ai-dock/stable-diffusion-webui:latest
        ports:
        - containerPort: 7860
        resources:
          limits:
            nvidia.com/gpu: 1
            memory: "64Gi"
            cpu: "12"
          requests:
            nvidia.com/gpu: 1
            memory: "64Gi"
            cpu: "12"
        env:
        - name: WEB_ENABLE_AUTH
          value: "false"
        - name: PORT
          value: "7860"
        - name: WEBUI_ARGS
          value: "--api --listen"
        volumeMounts:
        - name: models-volume
          mountPath: /opt/stable-diffusion-webui/models
      volumes:
      - name: models-volume
        persistentVolumeClaim:
          claimName: automatic1111-models-pvc
      nodeSelector:
        ollama-gpu: "true"

auto1111-svc.yaml

apiVersion: v1
kind: Service
metadata:
  name: automatic1111-service
  namespace: ollama
spec:
  selector:
    app: automatic1111
  ports:
    - protocol: TCP
      port: 7860  # External port you want to use
      targetPort: 7860  # Port on the pod where Automatic 1111 is listening
  type: LoadBalancer

auto1111-pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: automatic1111-models-pvc
  namespace: ollama
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500Gi  # Adjust based on your needs

hicotton02 commented 3 months ago

I modified the deployment yaml to work with ollama and open-webui

robballantyne commented 3 months ago

@hicotton02 Thank you Keven, I appreciate you taking the time. I'll play with this config when I get chance and add a documentation section in the base image.

Fwiw, if you need it you can apply the as yet undocumented SUPERVISOR_NO_AUTOSTART=jupyter,syncthing (any service) to trim the container down if you aren't using the extra parts I need for my single-image cloud deployment

hicotton02 commented 3 months ago

edit: I can reclose this as its not relative to the opening issue, but figured you might have a quick answer.

Thanks for adding that. I am currently having an issue where I get the

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

A: 1.94 GB, R: 3.74 GB, Sys: 4.1/15.7666 GB (26.0%)

error when trying to generate anything. Im not sure what happened, but in this docker, is there any specifics that could cause this issue?

robballantyne commented 3 months ago

Not certain as never had the problem but possibly a vram issue. Found some similar issue with potential resolution at https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/2373

hicotton02 commented 3 months ago

I saw this. does commandlineargs translate to webuiargs? so I can throw those into WEBUI_ARGS

robballantyne commented 3 months ago

Yes 👍

hicotton02 commented 3 months ago

didnt resolve it... hmm...

hicotton02 commented 3 months ago

name: WEBUI_ARGS value: "--api --listen --always-batch-cond-uncond --opt-split-attention --device-id=0"

this worked.

ai-dock / stable-diffusion-webui

Deploying into k8s cluster #25