canonical / bundle-kubeflow

Charmed Kubeflow
Apache License 2.0
104 stars 50 forks source link

Custom Kubeflow app breaks juju #677

Open Pavel-Konarik opened 1 year ago

Pavel-Konarik commented 1 year ago

Bug Description

After adding a custom flask app to Kubeflows dashboard following official steps, everything works as expected. MyApp is present in the menu and upon clicking it, response from the server (Hello World) is displayed as part of Kubeflow dashboard.

But after restarting microk8s using microk8s stop; microk8s start (or rebooting) and logging in into the Kubeflow dashboard, I am greeted by upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection failure, transport failure reason: delayed connect error: 111.

I can confirm that virtual service is running as expected, as visiting http://10.64.140.43.nip.io/myapp directly yields the correct "Hello world" response from the Flask server. It is just dashboard that is malfunctioning.

The interesting part is that juju debug-log --replay is displaying a series of Python errors from /var/lib/juju/agents/unit-kubeflow-dashboard-0/charm/venv/ops/framework.py. These errors are reoccuring every 5 seconds. Logs provided bellow.

Theory

Juju does not like deploying my app onto the microk8s directly using microk8s apply -f ..., but did not find any support for my theory in any documentation.

Environment

Tested with:

juju 3.1/stable
microk8s 1.25-strict/stable
juju deploy kubeflow --trust  --channel=1.7/stable

and

juju 2.9/stable
microk8s 1.24/stable
juju deploy kubeflow --trust  --channel=1.7/stable

Relevant Log Output

juju debug-log --replay is getting spammed with the following every 5 seconds.

application-oidc-gatekeeper: 21:09:36 INFO juju.worker.caasoperator.uniter.oidc-gatekeeper/1.operation ran "update-status" hook (via hook dispatching script: dispatch)
INFO unit.kubeflow-dashboard/0.juju-log Rendering manifests
INFO unit.kubeflow-dashboard/0.juju-log Reconcile completed successfully
INFO unit.kubeflow-dashboard/0.juju-log Rendering manifests
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3.8/logging/__init__.py", line 954, in handle
    self.emit(record)
  File "/var/lib/juju/agents/unit-kubeflow-dashboard-0/charm/venv/ops/log.py", line 41, in emit
    self.model_backend.juju_log(record.levelname, self.format(record))
  File "/usr/lib/python3.8/logging/__init__.py", line 929, in format
    return fmt.format(record)
  File "/usr/lib/python3.8/logging/__init__.py", line 676, in format
    record.exc_text = self.formatException(record.exc_info)
  File "/usr/lib/python3.8/logging/__init__.py", line 626, in formatException
    traceback.print_exception(ei[0], ei[1], tb, None, sio)
  File "/usr/lib/python3.8/traceback.py", line 103, in print_exception
    for line in TracebackException(
  File "/usr/lib/python3.8/traceback.py", line 617, in format
    yield from self.format_exception_only()
  File "/usr/lib/python3.8/traceback.py", line 566, in format_exception_only
    stype = smod + '.' + stype
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Original exception was:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kubeflow-dashboard-0/charm/venv/lightkube/core/generic_client.py", line 188, in raise_for_status
    resp.raise_for_status()
  File "/var/lib/juju/agents/unit-kubeflow-dashboard-0/charm/venv/httpx/_models.py", line 749, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '409 Conflict' for url 'https://10.152.183.1/api/v1/namespaces/kubeflow/configmaps/centraldashboard-config?WARNING unit.kubeflow-dashboard/0.kubeflow-dashboard-pebble-ready For more information check: https://httpstatuses.com/409

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./src/charm.py", line 214, in _deploy_k8s_resources
    self.configmap_handler.apply()
  File "/var/lib/juju/agents/unit-kubeflow-dashboard-0/charm/venv/charmed_kubeflow_chisme/kubernetes/_kubernetes_resource_handler.py", line 234, in WARNING unit.kubeflow-dashboard/0.kubeflow-dashboard-pebble-ready     raise e
WARNING unit.kubeflow-dashboard/0.kubeflow-dashboard-pebble-ready   File "/var/lib/juju/agents/unit-kubeflow-dashboard-0/charm/venv/charmed_kubeflow_chisme/kubernetes/_kubernetes_resource_handler.py", line 219, in WARNING unit.kubeflow-dashboard/0.kubeflow-dashboard-pebble-ready     apply_many(client=self.lightkube_client, objs=resources, force=force)
  File "/var/lib/juju/agents/unit-kubeflow-dashboard-0/charm/venv/charmed_kubeflow_chisme/lightkube/batch/_many.py", line 64, in apply_many
    returns[i] = client.apply(
  File "/var/lib/juju/agents/unit-kubeflow-dashboard-0/charm/venv/lightkube/core/client.py", line 456, in apply
    return self.patch(type(obj), name, obj, namespace=namespace,
  File "/var/lib/juju/agents/unit-kubeflow-dashboard-0/charm/venv/lightkube/core/client.py", line 325, in patch
    return self._client.request("patch", res=res, name=name, namespace=namespace, obj=obj,
  File "/var/lib/juju/agents/unit-kubeflow-dashboard-0/charm/venv/lightkube/core/generic_client.py", line 245, in request
    return self.handle_response(method, resp, br)
  File "/var/lib/juju/agents/unit-kubeflow-dashboard-0/charm/venv/lightkube/core/generic_client.py", line 196, in handle_response
    self.raise_for_status(resp)
  File "/var/lib/juju/agents/unit-kubeflow-dashboard-0/charm/venv/lightkube/core/generic_client.py", line 190, in raise_for_status
    raise transform_exception(e)
lightkube.core.exceptions.ApiError: Apply failed with 1 conflict: conflict with "kubectl-edit" using v1: .data.links

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./src/charm.py", line 254, in <module>
    main(KubeflowDashboardOperator)
  File "/var/lib/juju/agents/unit-kubeflow-dashboard-0/charm/venv/ops/main.py", line 438, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-kubeflow-dashboard-0/charm/venv/ops/main.py", line 150, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-kubeflow-dashboard-0/charm/venv/ops/framework.py", line 355, in emit
    framework._emit(event)  # noqa
  File "/var/lib/juju/agents/unit-kubeflow-dashboard-0/charm/venv/ops/framework.py", line 856, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-kubeflow-dashboard-0/charm/venv/ops/framework.py", line 931, in _reemit
    custom_handler(event)
  File "./src/charm.py", line 231, in main
    self._deploy_k8s_resources()
  File "./src/charm.py", line 216, in _deploy_k8s_resources
    raise GenericCharmRuntimeError("Failed to create K8S resources") from e
<unknown>GenericCharmRuntimeError: Failed to create K8S resources
ERROR juju.worker.uniter.operation hook "kubeflow-dashboard-pebble-ready" (via hook dispatching script: dispatch) failed: exit status 1
ERROR juju.worker.uniter pebble poll failed for container "kubeflow-dashboard": failed to send pebble-ready event: hook failed

Additional Context

Files for replication

app.py

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/', defaults={'path': ''}, methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH', 'OPTIONS'])
@app.route('/<path:path>', methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH', 'OPTIONS'])
def capture_request_info(path):
    request_info = {
        "Hello": "World",
    }

    return jsonify(request_info)

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=9595)

Dockerfile

FROM python:3.8-slim
WORKDIR /app

COPY app.py .
RUN pip install Flask

CMD [ "python", "./app.py" ]

myapp-deployment.yml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp-container
        image: 127.0.0.1:32000/myapp:v01
        ports:
        - containerPort: 9595

myapp-service.yml

apiVersion: v1
kind: Service
metadata:
  name: myapp-service
spec:
  selector:
    app: myapp
  ports:
    - protocol: TCP
      port: 9595
      targetPort: 9595
  type: NodePort

myapp-virtualservice.yml

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: myapp-virtualservice
  namespace: kubeflow
spec:
  hosts:
  - "*"
  gateways:
  - kubeflow/kubeflow-gateway
  http:
  - match:
    - uri:
        prefix: "/myapp"
    rewrite:
      uri: "/"
    route:
    - destination:
        host: myapp-service.default.svc.cluster.local
        port:
          number: 9595

Replication steps

Start with Charmed Kubeflow on Microk8s from official guide, but allow registry using

microk8s enable registry dns hostpath-storage ingress metallb:10.64.140.43-10.64.140.49
# Wait for the microk8s to enable all components

# This is needed for registry to work with docker in microk8s
sudo mkdir -p /var/snap/microk8s/current/args/certs.d/127.0.0.1:32000
sudo touch /var/snap/microk8s/current/args/certs.d/127.0.0.1:32000/hosts.toml

sudo echo -e 'server = "http://127.0.0.1:32000"\n\n[host."http://127.0.0.1:32000"]\ncapabilities = ["pull", "resolve"]' | sudo tee /var/snap/microk8s/current/args/certs.d/127.0.0.1:32000/hosts.toml

microk8s stop
microk8s start

Create folder myapp containing the app.py and Dockerfile above.

sudo apt-get install -y docker.io
sudo usermod -aG docker ${USER}
su - ${USER}

cd myapp

docker build . -t 127.0.0.1:32000/myapp:v01
docker push 127.0.0.1:32000/myapp:v01

microk8s kubectl apply -f myapp-deployment.yml
microk8s kubectl apply -f myapp-service.yml
microk8s kubectl apply -f myapp-virtualservice.yml

microk8s kubectl edit cm centraldashboard-config -n kubeflow
add {"type": "item", "link": "/myapp/", "text": "MyApp", "icon": "device:storage"} to the data->links

You can verify it works correctly by visiting dashboard and seeing "Hello world" as part of the page when in "MyApp" tab.

microk8s stop
microk8s start

Logging into the dashboard gives error and Python errors are spammed in juju logs.

I can provide a small (50gb) VM image if needed.

Pavel-Konarik commented 1 year ago

Just wanted to add that "non-charmed" setup (but with minikube instead of microk8s) works as expected.