Closed omus closed 3 years ago
I'm working through the details of cluster deployment locally but I wanted to validate the minikube GHA works as advertised
Setting up minikube in GHAs takes almost 2 minutes. Additionally the minikube GHA needs to be run on Ubuntu so I should refactor this into a separate job which can run the non-cluster tests independently
The manager pod was being successfully launched but the worker pod was being stuck in pending. I've noticed that the minikube GHA defaults to driver: none
so I'll try my luck with driver: docker
which also supported by the GHA (https://github.com/marketplace/actions/setup-minikube-kubernetes-cluster#optional-input-parameters)
When the documentation states GITHUB_ENV
entries should be defined as {name}={value}
they aren't messing around.
Including a comment? How about an error message:
Error: Unable to process file command 'env' successfully.
Error: Invalid environment variable format '# To point your shell to minikube's docker-daemon, run:'
Using export
? No problem we'll just make your environmental variable exactly "export {name}". That's definitely what you want.
Using quotes around your value? Obviously you want to include the quotes in your value.
Typically, I'm manually adding environmental variables so I've never noticed this before but since I don't know what variables minikube docker-env
will emit I needed to be more general here.
It appears that the minikube drivers "none" and "docker" have the same issue where the manager pod starts but the worker pod is stuck in pending. I'm assuming I don't have enough resources to launch both pods but I'm attempting to confirm this theory
Update: Events output from describing the worker pod:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 0s (x5 over 3m7s) default-scheduler 0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory.
Fun fact: the macOS GitHub runners have more CPUs and memory than the linux ones. Unfortunately the Docker Buildx action is unsupported for macOS. I remember the minikube GHA also stated it was compatible only with Ubuntu.
There are a couple of options open to us still:
Possibly, there are some additional options if I can can get the minikube to oversubscribe.
Oversubscribing seems promising. Looks like I need to push the image to both nodes:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1s default-scheduler Successfully assigned default/test-worker-success-kbclk to minikube-m02
Warning ErrImageNeverPull 0s kubelet Container image "k8s-cluster-managers:add85c3" is not present with pull policy of Never
Warning Failed 0s kubelet Error: ErrImageNeverPull
I'll continue looking into this approach
On multi-node clusters you can no longer use minikube docker-env
as you'll be greeted with:
β Exiting due to ENV_MULTINODE_CONFLICT: The docker-env command is incompatible with multi-node clusters. Use the 'registry' add-on: https://minikube.sigs.k8s.io/docs/handbook/registry/
I did attempt to use the registry addon following the official instructions but they seemed like overkill for use in a CI environment when you're setting up as often as you are pushing images. Because of this I ended up using minikube ssh
and Docker's save
/load
to transfer the image on to the nodes
On one of the CI runs the manager saw this error in the events list: MountVolume.SetUp failed for volume "julia-manager-serviceaccount-token-hrtmt" : failed to sync secret cache: timed out waiting for the condition
. It didn't seem to impact the run though and the next CI run didn't see this.
What seems to be the last remaining CI issue is that the manager is unable to establish a connection to the worker over the network. I've managed to get these tests working on my local multi-node minikube cluster so I believe there's something special about the CI environment I need to adjust for.
π And finally we have a functional Kubernetes test that works locally and on CI. I have some refactoring to do but the hard work is over π
The previous cluster tests on the CI failed due to not being able to pull the image (https://github.com/beacon-biosignals/K8sClusterManagers.jl/runs/2377192899). That's probably the last thing that needs to be investigated before merging this PR
Merging #23 (a0ff4ba) into main (6344023) will increase coverage by
6.54%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## main #23 +/- ##
==========================================
+ Coverage 29.03% 35.57% +6.54%
==========================================
Files 2 2
Lines 93 104 +11
==========================================
+ Hits 27 37 +10
- Misses 66 67 +1
Impacted Files | Coverage Ξ | |
---|---|---|
src/native_driver.jl | 29.34% <0.00%> (+8.36%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Ξ = absolute <relative> (impact)
,ΓΈ = not affected
,? = missing data
Powered by Codecov. Last update 6344023...a0ff4ba. Read the comment docs.
I just need to add some documentation on minikube docker-env
This beast is RTM
Working on getting K8sClusterManagers tested within a local k8s cluster. Having this will allow for improved tests and faster iteration.
Partially addresses: https://github.com/beacon-biosignals/K8sClusterManagers.jl/issues/8