Closed misohu closed 6 months ago
Thank you for reporting us your feedback!
The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5553.
This message was autogenerated
I think this has occurred because of at least two things:
gcr.io/ml-pipeline/visualization-server:2.0.3
~=5.2GB and in past it was closer to 4.4GB)easimon/maximize-build-space
action no longer frees up as much space as it used to (see this comment)It appears that because of (2), the 2.0.3
track here just has enough space to run the tests, and combining (1)+(2) means that when we create a user profile, the runner runs out of space while deploying the visualization and artifact server pods in the user's namespace.
A possible solution to this issue is to switch to the the jlumbroso/free-disk-space action which, with default settings, leaves the runner with ~45GB free.
Nice quick way for unblocking us @ca-scribner!
For the long term solution I propose that we'll go with self-hosted runners https://github.com/canonical/kfp-operators/pull/428#issuecomment-2046961785
I had tried to play a bit around with those in https://github.com/canonical/kfp-operators/pull/415 and https://github.com/canonical/kfp-operators/pull/414. I'll do a cleanup and have also a dedicated PR and issue for this so we laser focus it on the changes we'll need to do holistically.
I'll have them ready by the sprint so that we can sit down with IS team and show them our blockers.
I'll add a comment here as well so that the lineage of the effort is tracked.
Bug Description
When running CI the
test bundle v2
fails. After sshing into the runner we found out the main problem is insufficient amount of disk space causing pods stopping in pending state withdf command from inside the runner:
This issue is similar to this one https://github.com/canonical/bundle-kubeflow/issues/813
To Reproduce
main
against main to trigger CIEnvironment
Github actions CI in main branch
Relevant Log Output
Additional Context
No response