Test the UATs for the 1.9 release on Microk8s

UATs in 1.9/beta

Identified issues:

I ran UATs with tox -ve kubeflow-local, these are the results:

============================================================================ short test summary info =============================================================================
FAILED driver/test_kubeflow_workloads.py::test_kubeflow_workloads - Failed: Something went wrong while running Job test-kubeflow/test-kubeflow. Please inspect the attached logs for more info...
==================================================================== 1 failed, 1 passed in 1988.01s (0:33:08) ====================================================================
kubeflow-local: exit 1 (1989.54 seconds) /home/ubuntu/charmed-kubeflow-uats> pytest -vv --tb native /home/ubuntu/charmed-kubeflow-uats/driver/ -s --filter 'not mlflow' --model kubeflow pid=591250
  kubeflow-local: FAIL code 1 (1989.59=setup[0.05]+cmd[1989.54] seconds)
  evaluation failed :( (1989.68 seconds)

According to the logs, the Katib integration test is failing:

------------------------------ Captured log call -------------------------------
INFO     test_notebooks:test_notebooks.py:44 Running katib-integration.ipynb...
ERROR    test_notebooks:test_notebooks.py:58 Cell In[8], line 8, in assert_experiment_succeeded(client, experiment)
      1 @retry(
      2     wait=wait_exponential(multiplier=2, min=1, max=10),
      3     stop=stop_after_attempt(30),
      4     reraise=True,
      5 )
      6 def assert_experiment_succeeded(client, experiment):
      7     """Wait for the Katib Experiment to complete successfully."""
----> 8     assert client.is_experiment_succeeded(name=experiment), f"Katib Experiment was not successful."
AssertionError: Katib Experiment was not successful.
=========================== short test summary info ============================
FAILED test_notebooks.py::test_notebook[katib-integration] - Failed: AssertionError: Katib Experiment was not successful.
============ 1 failed, 4 passed, 4 deselected in 1940.13s (0:32:20) ============
FAILED
------------------------------------------------------------------------------- live log teardown --------------------------------------------------------------------------------
INFO     test_kubeflow_workloads:test_kubeflow_workloads.py:82 Deleting Profile test-kubeflow...
INFO     httpx:_client.py:1013 HTTP Request: DELETE https://172.31.15.25:16443/apis/kubeflow.org/v1/profiles/test-kubeflow "HTTP/1.1 200 OK"
INFO     test_kubeflow_workloads:test_kubeflow_workloads.py:141 Deleting Job test-kubeflow/test-kubeflow...
INFO     httpx:_client.py:1013 HTTP Request: DELETE https://172.31.15.25:16443/apis/batch/v1/namespaces/test-kubeflow/jobs/test-kubeflow "HTTP/1.1 200 OK"

Looking a bit more into the logs, I can see the following:

============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-8.2.2, pluggy-1.5.0 -- /opt/conda/bin/python3.8
cachedir: .pytest_cache
rootdir: /tests
configfile: pytest.ini
plugins: anyio-3.6.2
collecting ... collected 9 items / 4 deselected / 5 selected

test_notebooks.py::test_notebook[katib-integration]
-------------------------------- live log call ---------------------------------
INFO     test_notebooks:test_notebooks.py:44 Running katib-integration.ipynb...
ERROR    test_notebooks:test_notebooks.py:58 Cell In[8], line 8, in assert_experiment_succeeded(client, experiment)
      1 @retry(
      2     wait=wait_exponential(multiplier=2, min=1, max=10),
      3     stop=stop_after_attempt(30),
      4     reraise=True,
      5 )
      6 def assert_experiment_succeeded(client, experiment):
      7     """Wait for the Katib Experiment to complete successfully."""
----> 8     assert client.is_experiment_succeeded(name=experiment), f"Katib Experiment was not successful."
AssertionError: Katib Experiment was not successful.
FAILED                                                                   [ 20%]
test_notebooks.py::test_notebook[kfp-v1-integration]
-------------------------------- live log call ---------------------------------
INFO     test_notebooks:test_notebooks.py:44 Running kfp-v1-integration.ipynb...
PASSED                                                                   [ 40%]
test_notebooks.py::test_notebook[kfp-v2-integration]
-------------------------------- live log call ---------------------------------
INFO     test_notebooks:test_notebooks.py:44 Running kfp-v2-integration.ipynb...
PASSED                                                                   [ 60%]
test_notebooks.py::test_notebook[kserve-integration]
-------------------------------- live log call ---------------------------------
INFO     test_notebooks:test_notebooks.py:44 Running kserve-integration.ipynb...
PASSED                                                                   [ 80%]
test_notebooks.py::test_notebook[training-integration]
-------------------------------- live log call ---------------------------------
INFO     test_notebooks:test_notebooks.py:44 Running training-integration.ipynb...
PASSED                                                                   [100%]

=================================== FAILURES ===================================
_______________________ test_notebook[katib-integration] _______________________

test_notebook = '/tests/notebooks/katib/katib-integration.ipynb'

    @pytest.mark.ipynb
    @pytest.mark.parametrize(
        # notebook - ipynb file to execute
        "test_notebook",
        NOTEBOOKS.values(),
        ids=NOTEBOOKS.keys(),
    )
    def test_notebook(test_notebook):
        """Test Notebook Generic Wrapper."""
        os.chdir(os.path.dirname(test_notebook))

        with open(test_notebook) as nb:
            notebook = nbformat.read(nb, as_version=nbformat.NO_CONVERT)

        ep = ExecutePreprocessor(
            timeout=-1, kernel_name="python3", on_notebook_start=install_python_requirements
        )
        ep.skip_cells_with_tag = "pytest-skip"

        try:
            log.info(f"Running {os.path.basename(test_notebook)}...")
            output_notebook, _ = ep.preprocess(notebook, {"metadata": {"path": "./"}})
            # persist the notebook output to the original file for debugging purposes
            save_notebook(output_notebook, test_notebook)
        except CellExecutionError as e:
            # handle underlying error
            pytest.fail(f"Notebook execution failed with {e.ename}: {e.evalue}")

        for cell in output_notebook.cells:
            metadata = cell.get("metadata", dict)
            if "raises-exception" in metadata.get("tags", []):
                for cell_output in cell.outputs:
                    if cell_output.output_type == "error":
                        # extract the error message from the cell output
                        log.error(format_error_message(cell_output.traceback))
>                       pytest.fail(cell_output.traceback[-1])
E                       Failed: AssertionError: Katib Experiment was not successful.

/tests/test_notebooks.py:59: Failed

Preliminary tests for beta

Deployed juju kubeflow --channel 1.9/beta --trust
Configured dex-auth and oidc-gatekeeper's public-url = http://dex-auth.kubeflow.svc:5556
Configured dex-auth's static-username and static-password
Waited for about 10 minutes and checked the status of the model:

ubuntu@ip-172-31-15-25:~$ juju status
Model     Controller  Cloud/Region        Version  SLA          Timestamp
kubeflow  uk8s-343    microk8s/localhost  3.4.4    unsupported  20:23:35Z

App                        Version                  Status  Scale  Charm                    Channel       Rev  Address         Exposed  Message
admission-webhook                                   active      1  admission-webhook        latest/beta   328  10.152.183.124  no
argo-controller                                     active      1  argo-controller          latest/beta   526  10.152.183.183  no
dex-auth                                            active      1  dex-auth                 latest/beta   507  10.152.183.141  no
envoy                                               active      1  envoy                    latest/beta   231  10.152.183.126  no
istio-ingressgateway                                active      1  istio-gateway            latest/beta  1048  10.152.183.69   no
istio-pilot                                         active      1  istio-pilot              latest/beta  1013  10.152.183.23   no
jupyter-controller                                  active      1  jupyter-controller       latest/beta  1002  10.152.183.175  no
jupyter-ui                                          active      1  jupyter-ui               latest/beta   925  10.152.183.41   no
katib-controller                                    active      1  katib-controller         latest/beta   690  10.152.183.35   no
katib-db                   8.0.36-0ubuntu0.22.04.1  active      1  mysql-k8s                8.0/stable    153  10.152.183.129  no
katib-db-manager                                    active      1  katib-db-manager         latest/beta   653  10.152.183.50   no
katib-ui                                            active      1  katib-ui                 latest/beta   657  10.152.183.217  no
kfp-api                                             active      1  kfp-api                  latest/beta  1466  10.152.183.91   no
kfp-db                     8.0.36-0ubuntu0.22.04.1  active      1  mysql-k8s                8.0/stable    153  10.152.183.80   no
kfp-metadata-writer                                 active      1  kfp-metadata-writer      latest/beta   524  10.152.183.59   no
kfp-persistence                                     active      1  kfp-persistence          latest/beta  1473  10.152.183.42   no
kfp-profile-controller                              active      1  kfp-profile-controller   latest/beta  1431  10.152.183.130  no
kfp-schedwf                                         active      1  kfp-schedwf              latest/beta  1484  10.152.183.180  no
kfp-ui                                              active      1  kfp-ui                   latest/beta  1467  10.152.183.229  no
kfp-viewer                                          active      1  kfp-viewer               latest/beta  1499  10.152.183.77   no
kfp-viz                                             active      1  kfp-viz                  latest/beta  1417  10.152.183.219  no
knative-eventing                                    active      1  knative-eventing         latest/beta   441  10.152.183.111  no
knative-operator                                    active      1  knative-operator         latest/beta   416  10.152.183.134  no
knative-serving                                     active      1  knative-serving          latest/beta   442  10.152.183.75   no
kserve-controller                                   active      1  kserve-controller        latest/beta   397  10.152.183.132  no
kubeflow-dashboard                                  active      1  kubeflow-dashboard       latest/beta   600  10.152.183.32   no
kubeflow-profiles                                   active      1  kubeflow-profiles        latest/beta   393  10.152.183.221  no
kubeflow-roles                                      active      1  kubeflow-roles           latest/beta   225  10.152.183.150  no
kubeflow-volumes                                    active      1  kubeflow-volumes         latest/beta   314  10.152.183.28   no
metacontroller-operator                             active      1  metacontroller-operator  latest/beta   280  10.152.183.61   no
minio                      res:oci-image@5102166    active      1  minio                    latest/beta   334  10.152.183.21   no
mlmd                                                active      1  mlmd                     latest/beta   201  10.152.183.197  no
oidc-gatekeeper                                     active      1  oidc-gatekeeper          latest/beta   396  10.152.183.43   no
pvcviewer-operator                                  active      1  pvcviewer-operator       latest/beta   118  10.152.183.253  no
seldon-controller-manager                           active      1  seldon-core              latest/beta   691  10.152.183.236  no
tensorboard-controller                              active      1  tensorboard-controller   latest/beta   307  10.152.183.18   no
tensorboards-web-app                                active      1  tensorboards-web-app     latest/beta   295  10.152.183.211  no
training-operator                                   active      1  training-operator        latest/beta   483  10.152.183.215  no

Using the LB tried logging into the dashboard:

I was able to log in and navigate the dashboard (all components seem to be working)

I tried creating a notebook, connect to it and use it - it works
I tried creating a Pipelines experiment, create a run and a recurring run - it works
Looked into volumes and using the pvc viewer, I was able to navigate directories

Preliminary tests indicate the 1.9/beta bundle works just fine.

canonical / bundle-kubeflow

Test the UATs for the 1.9 release on Microk8s #808

Context

What needs to get done

Definition of Done

UATs in 1.9/beta

Preliminary tests for beta