beacon-biosignals / K8sClusterManagers.jl

A Julia cluster manager for Kubernetes
Other
31 stars 5 forks source link

CI Cluster Test failed #89

Open kimlaberinto opened 2 years ago

kimlaberinto commented 2 years ago

Not sure why this CI cluster test failed:

[ Info: Waiting for test-multi-addprocs job. This could take up to 4 minutes...
 Error from server (NotFound): pods "test-multi-addprocs-st7tc" not found
test-multi-addprocs: Error During Test at /home/runner/work/K8sClusterManagers.jl/K8sClusterManagers.jl/test/cluster.jl:279
  Test threw exception
  Expression: pod_phase(manager_pod) == "Succeeded"

https://github.com/beacon-biosignals/K8sClusterManagers.jl/runs/5027583871?check_suite_focus=true#step:9:140

omus commented 2 years ago

Copying relevant logs here as GHA logs don't persist:

[ Info: Waiting for test-multi-addprocs job. This could take up to 4 minutes...
Error from server (NotFound): pods "test-multi-addprocs-st7tc" not found
test-multi-addprocs: Error During Test at /home/runner/work/K8sClusterManagers.jl/K8sClusterManagers.jl/test/cluster.jl:279
  Test threw exception
  Expression: pod_phase(manager_pod) == "Succeeded"
  failed process: Process(setenv(`/home/runner/.julia/artifacts/e549ab3a763d3b31e726aa6336c6dbb75ee90a05/bin/kubectl get pod/test-multi-addprocs-st7tc -o 'jsonpath={.status.phase}'`,["PATH=/home/runner/.julia/artifacts/e549ab3a763d3b31e726aa6336c6dbb75ee90a05/bin:/home/runner/work/_temp:/opt/hostedtoolcache/julia/1.7.1/x64/bin:/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin:/home/runner/.local/bin:/opt/pipx_bin:/home/runner/.cargo/bin:/home/runner/.config/composer/vendor/bin:/usr/local/.ghcup/bin:/home/runner/.dotnet/tools:/snap/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin", "DOTNET_SKIP_FIRST_TIME_EXPERIENCE=1", "GITHUB_RUN_NUMBER=208", "GITHUB_REF_NAME=88/merge", "RUNNER_ARCH=X64", "PERFLOG_LOCATION_SETTING=RUNNER_PERFLOG", "LD_LIBRARY_PATH=/opt/hostedtoolcache/julia/1.7.1/x64/bin/../lib/julia:/opt/hostedtoolcache/julia/1.7.1/x64/bin/../lib", "K8S_CLUSTER_TESTS=true", "ACCEPT_EULA=Y", "ANT_HOME=/usr/share/ant", "RUNNER_USER=runner", "LEIN_HOME=/usr/local/lib/lein", "GITHUB_ACTOR=kimlaberinto", "ANDROID_NDK_LATEST_HOME=/usr/local/lib/android/sdk/ndk/23.1.7779620", "USER=runner", "CONDA=/usr/share/miniconda", "GITHUB_REF_PROTECTED=false", "GITHUB_SHA=b39d201f2b4c3780982f770e755e1c6c91503709", "JAVA_HOME=/usr/lib/jvm/temurin-11-jdk-amd64", "GITHUB_API_URL=https://api.github.com", "GITHUB_RUN_ATTEMPT=1", "GITHUB_ACTIONS=true", "VCPKG_INSTALLATION_ROOT=/usr/local/share/vcpkg", "MINIKUBE_HOME=/home/runner/work/_temp", "ANDROID_SDK_ROOT=/usr/local/lib/android/sdk", "SWIFT_PATH=/usr/share/swift/usr/bin", "GOROOT_1_17_X64=/opt/hostedtoolcache/go/1.17.6/x64", "GITHUB_ENV=/home/runner/work/_temp/_runner_file_commands/set_env_52151825-7529-4f10-9231-f2029174696c", "JAVA_HOME_17_X64=/usr/lib/jvm/temurin-17-jdk-amd64", "GITHUB_ACTION_PATH=/home/runner/work/_actions/julia-actions/julia-runtest/v1", "RUNNER_PERFLOG=/home/runner/perflog", "RUNNER_NAME=GitHub Actions 9", "GITHUB_RUN_ID=1780539670", "HOMEBREW_CELLAR=/home/linuxbrew/.linuxbrew/Cellar", "ImageOS=ubuntu20", "NVM_DIR=/home/runner/.nvm", "GITHUB_HEAD_REF=kpl/update-codecov", "GITHUB_RETENTION_DAYS=90", "GITHUB_SERVER_URL=https://github.com", "GITHUB_JOB=cluster-test", "DEBIAN_FRONTEND=noninteractive", "RUNNER_TRACKING_ID=github_ee352480-6154-44ca-8750-7f7c692fd5f1", "RUNNER_TOOL_CACHE=/opt/hostedtoolcache", "HOMEBREW_CLEANUP_PERIODIC_FULL_DAYS=3650", "AZURE_EXTENSION_DIR=/opt/az/azcliextensions", "HOMEBREW_NO_AUTO_UPDATE=1", "CHROMEWEBDRIVER=/usr/local/share/chrome_driver", "GITHUB_ACTION_REPOSITORY=", "GITHUB_WORKFLOW=CI", "GITHUB_ACTION=__julia-actions_julia-runtest", "HOME=/home/runner", "JAVA_HOME_8_X64=/usr/lib/jvm/temurin-8-jdk-amd64", "GITHUB_EVENT_PATH=/home/runner/work/_temp/_github_workflow/event.json", "K8S_CLUSTER_MANAGERS_TEST_IMAGE=k8s-cluster-managers:b39d201", "HOMEBREW_PREFIX=/home/linuxbrew/.linuxbrew", "SGX_AESM_ADDR=1", "GITHUB_REF=refs/pull/88/merge", "GITHUB_REPOSITORY=beacon-biosignals/K8sClusterManagers.jl", "INVOCATION_ID=3990f835d3004e2b87571c73a406a265", "ImageVersion=20220123.1", "LANG=C.UTF-8", "GITHUB_GRAPHQL_URL=https://api.github.com/graphql", "SHLVL=1", "DOTNET_MULTILEVEL_LOOKUP=0", "RUNNER_WORKSPACE=/home/runner/work/K8sClusterManagers.jl", "GITHUB_BASE_REF=main", "STATS_KEEPALIVE=false", "_=/opt/hostedtoolcache/julia/1.7.1/x64/bin/julia", "HOMEBREW_REPOSITORY=/home/linuxbrew/.linuxbrew/Homebrew", "GRADLE_HOME=/usr/share/gradle-7.3.3", "GITHUB_ACTION_REF=", "DEPLOYMENT_BASEPATH=/opt/runner", "PIPX_HOME=/opt/pipx", "ANDROID_NDK_ROOT=/usr/local/lib/android/sdk/ndk-bundle", "***", "GITHUB_WORKSPACE=/home/runner/work/K8sClusterManagers.jl/K8sClusterManagers.jl", "GRAALVM_11_ROOT=/usr/local/graalvm/graalvm-ce-java11-21.3.0", "XDG_CONFIG_HOME=/home/runner/.config", "ANDROID_HOME=/usr/local/lib/android/sdk", "CHROME_BIN=/usr/bin/google-chrome", "CI=true", "POWERSHELL_DISTRIBUTION_CHANNEL=GitHub-Actions-ubuntu20", "GECKOWEBDRIVER=/usr/local/share/gecko_driver", "GITHUB_PATH=/home/runner/work/_temp/_runner_file_commands/add_path_52151825-7529-4f10-9231-f2029174696c", "RUNNER_OS=Linux", "JOURNAL_STREAM=8:20833", "GITHUB_REF_TYPE=branch", "LEIN_JAR=/usr/local/lib/lein/self-installs/leiningen-2.9.8-standalone.jar", "JULIA_LOAD_PATH=@:/tmp/jl_RrxcF6", "BOOTSTRAP_HASKELL_NONINTERACTIVE=1", "PIPX_BIN_DIR=/opt/pipx_bin", "SELENIUM_JAR_PATH=/usr/share/java/selenium-server.jar", "JAVA_HOME_11_X64=/usr/lib/jvm/temurin-11-jdk-amd64", "RUNNER_TEMP=/home/runner/work/_temp", "GOROOT_1_16_X64=/opt/hostedtoolcache/go/1.16.13/x64", "GITHUB_REPOSITORY_OWNER=beacon-biosignals", "GITHUB_EVENT_NAME=pull_request", "DOTNET_NOLOGO=1", "GOROOT_1_15_X64=/opt/hostedtoolcache/go/1.15.15/x64", "OPENBLAS_MAIN_FREE=1", "ANDROID_NDK_HOME=/usr/local/lib/android/sdk/ndk-bundle", "AGENT_TOOLSDIRECTORY=/opt/hostedtoolcache"]), ProcessExited(1)) [1]

  Stacktrace:
   [1] pipeline_error
     @ ./process.jl:531 [inlined]
   [2] read(cmd::Cmd)
     @ Base ./process.jl:418
   [3] read(cmd::Cmd, #unused#::Type{String})
     @ Base ./process.jl:427
   [4] pod_phase(pod_name::SubString{String})
     @ Main ~/work/K8sClusterManagers.jl/K8sClusterManagers.jl/test/utils.jl:36
   [5] macro expansion
     @ /opt/hostedtoolcache/julia/1.7.1/x64/share/julia/stdlib/v1.7/Test/src/Test.jl:445 [inlined]
   [6] macro expansion
     @ ~/work/K8sClusterManagers.jl/K8sClusterManagers.jl/test/cluster.jl:271 [inlined]
   [7] macro expansion
     @ /opt/hostedtoolcache/julia/1.7.1/x64/share/julia/stdlib/v1.7/Test/src/Test.jl:1283 [inlined]
   [8] top-level scope
     @ ~/work/K8sClusterManagers.jl/K8sClusterManagers.jl/test/cluster.jl:235
Error from server (NotFound): jobs.batch "test-multi-addprocs" not found
[ Info: Describe job:
┌ Info: List pods for job test-multi-addprocs:
│ NAME                                     READY   STATUS      RESTARTS   AGE   JOB-NAME=TEST-MULTI-ADDPROCS
│ test-multi-addprocs-st7tc-worker-fnzvf   0/1     Completed   0          43s   
│ test-multi-addprocs-st7tc-worker-jwksk   0/1     Completed   0          28s   
└ test-success-slxr5-worker-jqvm7          0/1     Completed   0          95s   
[ Info: Manager pod "test-multi-addprocs-st7tc" not found
┌ Info: Describe worker 1/2 pod:
│ Name:         test-multi-addprocs-st7tc-worker-fnzvf
│ Namespace:    default
│ Priority:     0
│ Node:         minikube-m02/192.168.49.3
│ Start Time:   Tue, 01 Feb 2022 20:33:00 +0000
│ Labels:       manager=test-multi-addprocs-st7tc
│               worker-id=2
│ Annotations:  <none>
│ Status:       Succeeded
│ IP:           10.244.1.8
│ IPs:
│   IP:  10.244.1.8
│ Containers:
│   worker:
│     Container ID:  docker://25369b89560204270b609c7129b3c36111f5e124b09c10609d946879ce9c52c7
│     Image:         k8s-cluster-managers:b39d201
│     Image ID:      docker://sha256:a3f7dfa9c373b41e28bf6527e7b8801720aa125fa31db5b9cacb7d069eada486
│     Port:          <none>
│     Host Port:     <none>
│     Command:
│       /usr/local/julia/bin/julia
│       --worker=RL03XtNp463y3yuY
│     State:          Terminated
│       Reason:       Completed
│       Exit Code:    0
│       Started:      Tue, 01 Feb 2022 20:33:00 +0000
│       Finished:     Tue, 01 Feb 2022 20:33:29 +0000
│     Ready:          False
│     Restart Count:  0
│     Limits:
│       cpu:     500m
│       memory:  300Mi
│     Requests:
│       cpu:        500m
│       memory:     300Mi
│     Environment:  <none>
│     Mounts:
│       /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-w6rvq (ro)
│ Conditions:
│   Type              Status
│   Initialized       True 
│   Ready             False 
│   ContainersReady   False 
│   PodScheduled      True 
│ Volumes:
│   kube-api-access-w6rvq:
│     Type:                    Projected (a volume that contains injected data from multiple sources)
│     TokenExpirationSeconds:  3607
│     ConfigMapName:           kube-root-ca.crt
│     ConfigMapOptional:       <nil>
│     DownwardAPI:             true
│ QoS Class:                   Guaranteed
│ Node-Selectors:              <none>
│ Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
│                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
│ Events:
│   Type    Reason     Age   From               Message
│   ----    ------     ----  ----               -------
│   Normal  Scheduled  43s   default-scheduler  Successfully assigned default/test-multi-addprocs-st7tc-worker-fnzvf to minikube-m02
│   Normal  Pulled     43s   kubelet            Container image "k8s-cluster-managers:b39d201" already present on machine
│   Normal  Created    43s   kubelet            Created container worker
└   Normal  Started    43s   kubelet            Started container worker
┌ Info: Describe worker 2/2 pod:
│ Name:         test-multi-addprocs-st7tc-worker-jwksk
│ Namespace:    default
│ Priority:     0
│ Node:         minikube-m02/192.168.49.3
│ Start Time:   Tue, 01 Feb 2022 20:33:15 +0000
│ Labels:       manager=test-multi-addprocs-st7tc
│               worker-id=3
│ Annotations:  <none>
│ Status:       Succeeded
│ IP:           10.244.1.9
│ IPs:
│   IP:  10.244.1.9
│ Containers:
│   worker:
│     Container ID:  docker://ddbfb6efe6eaf5aac7fe6a2885e49d145248f02be166e3c83f78ce15936a72e5
│     Image:         k8s-cluster-managers:b39d201
│     Image ID:      docker://sha256:a3f7dfa9c373b41e28bf6527e7b8801720aa125fa31db5b9cacb7d069eada486
│     Port:          <none>
│     Host Port:     <none>
│     Command:
│       /usr/local/julia/bin/julia
│       --worker=RL03XtNp463y3yuY
│     State:          Terminated
│       Reason:       Completed
│       Exit Code:    0
│       Started:      Tue, 01 Feb 2022 20:33:16 +0000
│       Finished:     Tue, 01 Feb 2022 20:33:29 +0000
│     Ready:          False
│     Restart Count:  0
│     Limits:
│       cpu:     500m
│       memory:  300Mi
│     Requests:
│       cpu:        500m
│       memory:     300Mi
│     Environment:  <none>
│     Mounts:
│       /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nlpwr (ro)
│ Conditions:
│   Type              Status
│   Initialized       True 
│   Ready             False 
│   ContainersReady   False 
│   PodScheduled      True 
│ Volumes:
│   kube-api-access-nlpwr:
│     Type:                    Projected (a volume that contains injected data from multiple sources)
│     TokenExpirationSeconds:  3607
│     ConfigMapName:           kube-root-ca.crt
│     ConfigMapOptional:       <nil>
│     DownwardAPI:             true
│ QoS Class:                   Guaranteed
│ Node-Selectors:              <none>
│ Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
│                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
│ Events:
│   Type    Reason     Age   From               Message
│   ----    ------     ----  ----               -------
│   Normal  Scheduled  28s   default-scheduler  Successfully assigned default/test-multi-addprocs-st7tc-worker-jwksk to minikube-m02
│   Normal  Pulled     27s   kubelet            Container image "k8s-cluster-managers:b39d201" already present on machine
│   Normal  Created    27s   kubelet            Created container worker
└   Normal  Started    27s   kubelet            Started container worker
[ Info: No logs for manager (test-multi-addprocs-st7tc)
┌ Info: Logs for worker 1/2 (test-multi-addprocs-st7tc-worker-fnzvf):
└ julia_worker:9001#10.244.1.8
┌ Info: Logs for worker 2/2 (test-multi-addprocs-st7tc-worker-jwksk):
└ julia_worker:9001#10.244.1.9
omus commented 2 years ago

Appears the manager job was terminated and removed before debugging information could be rendered. Probably means we want to adjust some TTL settings so this can be debugged further