Open ca-scribner opened 5 months ago
Thank you for reporting us your feedback!
The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5276.
This message was autogenerated
Identified issues:
I ran UATs with tox -ve kubeflow-local
, these are the results:
============================================================================ short test summary info =============================================================================
FAILED driver/test_kubeflow_workloads.py::test_kubeflow_workloads - Failed: Something went wrong while running Job test-kubeflow/test-kubeflow. Please inspect the attached logs for more info...
==================================================================== 1 failed, 1 passed in 1988.01s (0:33:08) ====================================================================
kubeflow-local: exit 1 (1989.54 seconds) /home/ubuntu/charmed-kubeflow-uats> pytest -vv --tb native /home/ubuntu/charmed-kubeflow-uats/driver/ -s --filter 'not mlflow' --model kubeflow pid=591250
kubeflow-local: FAIL code 1 (1989.59=setup[0.05]+cmd[1989.54] seconds)
evaluation failed :( (1989.68 seconds)
According to the logs, the Katib integration test is failing:
------------------------------ Captured log call -------------------------------
INFO test_notebooks:test_notebooks.py:44 Running katib-integration.ipynb...
ERROR test_notebooks:test_notebooks.py:58 Cell In[8], line 8, in assert_experiment_succeeded(client, experiment)
1 @retry(
2 wait=wait_exponential(multiplier=2, min=1, max=10),
3 stop=stop_after_attempt(30),
4 reraise=True,
5 )
6 def assert_experiment_succeeded(client, experiment):
7 """Wait for the Katib Experiment to complete successfully."""
----> 8 assert client.is_experiment_succeeded(name=experiment), f"Katib Experiment was not successful."
AssertionError: Katib Experiment was not successful.
=========================== short test summary info ============================
FAILED test_notebooks.py::test_notebook[katib-integration] - Failed: AssertionError: Katib Experiment was not successful.
============ 1 failed, 4 passed, 4 deselected in 1940.13s (0:32:20) ============
FAILED
------------------------------------------------------------------------------- live log teardown --------------------------------------------------------------------------------
INFO test_kubeflow_workloads:test_kubeflow_workloads.py:82 Deleting Profile test-kubeflow...
INFO httpx:_client.py:1013 HTTP Request: DELETE https://172.31.15.25:16443/apis/kubeflow.org/v1/profiles/test-kubeflow "HTTP/1.1 200 OK"
INFO test_kubeflow_workloads:test_kubeflow_workloads.py:141 Deleting Job test-kubeflow/test-kubeflow...
INFO httpx:_client.py:1013 HTTP Request: DELETE https://172.31.15.25:16443/apis/batch/v1/namespaces/test-kubeflow/jobs/test-kubeflow "HTTP/1.1 200 OK"
Looking a bit more into the logs, I can see the following:
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-8.2.2, pluggy-1.5.0 -- /opt/conda/bin/python3.8
cachedir: .pytest_cache
rootdir: /tests
configfile: pytest.ini
plugins: anyio-3.6.2
collecting ... collected 9 items / 4 deselected / 5 selected
test_notebooks.py::test_notebook[katib-integration]
-------------------------------- live log call ---------------------------------
INFO test_notebooks:test_notebooks.py:44 Running katib-integration.ipynb...
ERROR test_notebooks:test_notebooks.py:58 Cell In[8], line 8, in assert_experiment_succeeded(client, experiment)
1 @retry(
2 wait=wait_exponential(multiplier=2, min=1, max=10),
3 stop=stop_after_attempt(30),
4 reraise=True,
5 )
6 def assert_experiment_succeeded(client, experiment):
7 """Wait for the Katib Experiment to complete successfully."""
----> 8 assert client.is_experiment_succeeded(name=experiment), f"Katib Experiment was not successful."
AssertionError: Katib Experiment was not successful.
FAILED [ 20%]
test_notebooks.py::test_notebook[kfp-v1-integration]
-------------------------------- live log call ---------------------------------
INFO test_notebooks:test_notebooks.py:44 Running kfp-v1-integration.ipynb...
PASSED [ 40%]
test_notebooks.py::test_notebook[kfp-v2-integration]
-------------------------------- live log call ---------------------------------
INFO test_notebooks:test_notebooks.py:44 Running kfp-v2-integration.ipynb...
PASSED [ 60%]
test_notebooks.py::test_notebook[kserve-integration]
-------------------------------- live log call ---------------------------------
INFO test_notebooks:test_notebooks.py:44 Running kserve-integration.ipynb...
PASSED [ 80%]
test_notebooks.py::test_notebook[training-integration]
-------------------------------- live log call ---------------------------------
INFO test_notebooks:test_notebooks.py:44 Running training-integration.ipynb...
PASSED [100%]
=================================== FAILURES ===================================
_______________________ test_notebook[katib-integration] _______________________
test_notebook = '/tests/notebooks/katib/katib-integration.ipynb'
@pytest.mark.ipynb
@pytest.mark.parametrize(
# notebook - ipynb file to execute
"test_notebook",
NOTEBOOKS.values(),
ids=NOTEBOOKS.keys(),
)
def test_notebook(test_notebook):
"""Test Notebook Generic Wrapper."""
os.chdir(os.path.dirname(test_notebook))
with open(test_notebook) as nb:
notebook = nbformat.read(nb, as_version=nbformat.NO_CONVERT)
ep = ExecutePreprocessor(
timeout=-1, kernel_name="python3", on_notebook_start=install_python_requirements
)
ep.skip_cells_with_tag = "pytest-skip"
try:
log.info(f"Running {os.path.basename(test_notebook)}...")
output_notebook, _ = ep.preprocess(notebook, {"metadata": {"path": "./"}})
# persist the notebook output to the original file for debugging purposes
save_notebook(output_notebook, test_notebook)
except CellExecutionError as e:
# handle underlying error
pytest.fail(f"Notebook execution failed with {e.ename}: {e.evalue}")
for cell in output_notebook.cells:
metadata = cell.get("metadata", dict)
if "raises-exception" in metadata.get("tags", []):
for cell_output in cell.outputs:
if cell_output.output_type == "error":
# extract the error message from the cell output
log.error(format_error_message(cell_output.traceback))
> pytest.fail(cell_output.traceback[-1])
E Failed: AssertionError: Katib Experiment was not successful.
/tests/test_notebooks.py:59: Failed
juju kubeflow --channel 1.9/beta --trust
dex-auth
and oidc-gatekeeper
's public-url = http://dex-auth.kubeflow.svc:5556
dex-auth
's static-username
and static-password
ubuntu@ip-172-31-15-25:~$ juju status
Model Controller Cloud/Region Version SLA Timestamp
kubeflow uk8s-343 microk8s/localhost 3.4.4 unsupported 20:23:35Z
App Version Status Scale Charm Channel Rev Address Exposed Message
admission-webhook active 1 admission-webhook latest/beta 328 10.152.183.124 no
argo-controller active 1 argo-controller latest/beta 526 10.152.183.183 no
dex-auth active 1 dex-auth latest/beta 507 10.152.183.141 no
envoy active 1 envoy latest/beta 231 10.152.183.126 no
istio-ingressgateway active 1 istio-gateway latest/beta 1048 10.152.183.69 no
istio-pilot active 1 istio-pilot latest/beta 1013 10.152.183.23 no
jupyter-controller active 1 jupyter-controller latest/beta 1002 10.152.183.175 no
jupyter-ui active 1 jupyter-ui latest/beta 925 10.152.183.41 no
katib-controller active 1 katib-controller latest/beta 690 10.152.183.35 no
katib-db 8.0.36-0ubuntu0.22.04.1 active 1 mysql-k8s 8.0/stable 153 10.152.183.129 no
katib-db-manager active 1 katib-db-manager latest/beta 653 10.152.183.50 no
katib-ui active 1 katib-ui latest/beta 657 10.152.183.217 no
kfp-api active 1 kfp-api latest/beta 1466 10.152.183.91 no
kfp-db 8.0.36-0ubuntu0.22.04.1 active 1 mysql-k8s 8.0/stable 153 10.152.183.80 no
kfp-metadata-writer active 1 kfp-metadata-writer latest/beta 524 10.152.183.59 no
kfp-persistence active 1 kfp-persistence latest/beta 1473 10.152.183.42 no
kfp-profile-controller active 1 kfp-profile-controller latest/beta 1431 10.152.183.130 no
kfp-schedwf active 1 kfp-schedwf latest/beta 1484 10.152.183.180 no
kfp-ui active 1 kfp-ui latest/beta 1467 10.152.183.229 no
kfp-viewer active 1 kfp-viewer latest/beta 1499 10.152.183.77 no
kfp-viz active 1 kfp-viz latest/beta 1417 10.152.183.219 no
knative-eventing active 1 knative-eventing latest/beta 441 10.152.183.111 no
knative-operator active 1 knative-operator latest/beta 416 10.152.183.134 no
knative-serving active 1 knative-serving latest/beta 442 10.152.183.75 no
kserve-controller active 1 kserve-controller latest/beta 397 10.152.183.132 no
kubeflow-dashboard active 1 kubeflow-dashboard latest/beta 600 10.152.183.32 no
kubeflow-profiles active 1 kubeflow-profiles latest/beta 393 10.152.183.221 no
kubeflow-roles active 1 kubeflow-roles latest/beta 225 10.152.183.150 no
kubeflow-volumes active 1 kubeflow-volumes latest/beta 314 10.152.183.28 no
metacontroller-operator active 1 metacontroller-operator latest/beta 280 10.152.183.61 no
minio res:oci-image@5102166 active 1 minio latest/beta 334 10.152.183.21 no
mlmd active 1 mlmd latest/beta 201 10.152.183.197 no
oidc-gatekeeper active 1 oidc-gatekeeper latest/beta 396 10.152.183.43 no
pvcviewer-operator active 1 pvcviewer-operator latest/beta 118 10.152.183.253 no
seldon-controller-manager active 1 seldon-core latest/beta 691 10.152.183.236 no
tensorboard-controller active 1 tensorboard-controller latest/beta 307 10.152.183.18 no
tensorboards-web-app active 1 tensorboards-web-app latest/beta 295 10.152.183.211 no
training-operator active 1 training-operator latest/beta 483 10.152.183.215 no
Preliminary tests indicate the 1.9/beta bundle works just fine.
Context
The UAT tests should be run on any new kubeflow bundle prior to release
What needs to get done
Definition of Done