Run conformance tests - Githubissues

jayunit100 commented 4 months ago

Would be cool to try to run sonobuoy.io against this and see which of the ~300 or so k8s conformance tests pass.

Note: Many are just apiserver related, so they might pass even if pods dont schedule.

Megapixel99 commented 4 months ago

Setting up sonobuoy would help make this project more complete, and it would definitely speed up development. Thank you for mentioning this.

Megapixel99 commented 4 months ago

I was running into issues with sonobuoy so I am, instead, working on setting up: hydrophone.

Sonobuoy fails if the response headers do not match exactly and Express automatically adds charset=UTF-8 to the Content-Type header. So far, Hydrophone does not seem to care about that.

jayunit100 commented 4 months ago

We could run the raw e2e tests also if you want I can help with that

But yeah hydrophone looks valid also.

Megapixel99 commented 4 months ago

I was unable to get hydrophone to work because hydrophone, tests from a docker container which does not work for this project unless you configure docker to use the same network as your host computer (which seems to take a lot of networking expertise that I do not have), hydrophone cannot reach your localhost. However, I did notice hydrophone runs 300ish ginkgo tests, so I decided to look for a way to run ginkgo on my own.

Excerpt of the logs from the hydrophone test docker container (specifically registry.k8s.io/conformance:v1.29.0) showing ginkgo being run:

...
Executable path: /usr/local/bin/ginkgo
Args (comma-delimited): /usr/local/bin/ginkgo,--focus=\[Conformance\],--skip=,--no-color=true,-v,--timeout=24h,/usr/local/bin/e2e.test,--,--disable-log-dump,--repo-root=/kubernetes,--provider=skeleton,--report-dir=/tmp/results,--kubeconfig=
2024/05/08 06:03:36 Now listening for interrupts
  W0508 06:03:37.602107      22 test_context.go:545] Unable to find in-cluster config, using default host : https://127.0.0.1:6443
  I0508 06:03:37.604155      22 e2e.go:117] Starting e2e run "46e344de-458c-49e4-b8ed-803033d81628" on Ginkgo node 1
Running Suite: Kubernetes e2e suite - /usr/local/bin
====================================================
Random Seed: 1715148216 - will randomize all specs

Will run 388 of 7407 specs
...

Luckily, I managed to find kubetest2 after stumbling upon kubernetes/test-infra. With kubetest2 I have been able to (almost fully) automate the 7000+ ginkgo tests (however, if the resources do not shut down properly though, it can be an issue, so there is still a manual component). The implementation of the ginkgo tests can be seen in commit https://github.com/Megapixel99/nodejs-k8s/commit/530072bd67f51aafb93c7659d2dbea36cd7c7c51 (specifically this line). I am unsure if we could fully automate the tests on github; however, I want to get the tests passing first, before automating on github which may take a while.

The first test ginkgo runs is a test on kubernetes Nodes, so I am working on getting those properly running.

If you want to install kubetest2 (and the resources used by the tests) you should be able to use these commands:

go install sigs.k8s.io/kubetest2@latest
go install sigs.k8s.io/kubetest2/kubetest2-noop@latest
go get -u -v -f sigs.k8s.io/kubetest2/kubetest2-tester-ginkgo@latest

I had to update my version of go to 1.19.4 since I kept receiving the error go/pkg/mod/github.com/buildkite/agent/v3@v3.62.0/api/retryable.go:8:2: package slices is not in GOROOT (/usr/local/Cellar/go/1.19.4/libexec/src/slices) when installing kubetest2-tester-ginkgo.

Megapixel99 commented 4 months ago

@jayunit100, are you familiar with kubetest2? I have hit an error and I'm unsure what exactly is wrong. I suspect there is supposed to be additional logging above the error but none seems to be in the logs. The error is:

  [FAILED] Error waiting for all pods to be running and ready: Told to stop trying after 0.010s.
  Unexpected final error while getting *pod.state: listing replication controllers in namespace kube-system: 
  In [SynchronizedBeforeSuite] at: k8s.io/kubernetes/test/e2e/e2e.go:232

And the full timeline logs from kubetest2 are:

Timeline >>
I0513 [time redacted] 19744 util.go:506] >>> kubeConfig: /Users/seth/Desktop/temp/coding/k8s/test-config
I0513 [time redacted] 19744 helper.go:48] Waiting up to 30m0s for all (but 0) nodes to be schedulable
STEP: Collecting events from namespace "kube-system". @ [date and time redacted]
STEP: Found 0 events. @ [date and time redacted]
I0513 [time redacted] 19744 resource.go:168] POD       NODE  PHASE    GRACE  CONDITIONS
I0513 [time redacted] 19744 resource.go:175] core-dns        Running         [{Initialized True [date, time, and timezone redacted] [date, time, and timezone redacted]  } {ContainersReady True [date, time, and timezone redacted] [date, time, and timezone redacted]  } {Ready True [date, time, and timezone redacted] [date, time, and timezone redacted]  } {PodScheduled True [date, time, and timezone redacted] [date, time, and timezone redacted]  }]
I0513 [time redacted] 19744 resource.go:178] 
I0513 [time redacted] 19744 dump.go:109] 
Logging node info for node 172.17.0.2
I0513 [time redacted] 19744 dump.go:114] Node Info: &Node{ObjectMeta:{172.17.0.2  default  cdb60db9-314b-468b-aae4-1ddde4fd85fa 6672082107999988 0 [date, time, and timezone redacted] <nil> <nil> map[name:worker-node-1] map[] [] [] []},Spec:NodeSpec{PodCIDR:,DoNotUseExternalID:,ProviderID:,Unschedulable:false,Taints:[]Taint{},ConfigSource:&NodeConfigSource{ConfigMap:&ConfigMapNodeConfigSource{Namespace:,Name:,UID:,ResourceVersion:,KubeletConfigKey:,},},PodCIDRs:[],},Status:NodeStatus{Capacity:ResourceList{cpu: {{1 0} {<nil>} 1 DecimalSI},ephemeral-storage: {{16000000000 0} {<nil>}  BinarySI},hugepages-1Gi: {{0 0} {<nil>} 0 DecimalSI},hugepages-2Mi: {{0 0} {<nil>} 0 DecimalSI},memory: {{16000000000 0} {<nil>}  BinarySI},pods: {{250 0} {<nil>} 250 DecimalSI},},Allocatable:ResourceList{cpu: {{1 0} {<nil>} 1 DecimalSI},ephemeral-storage: {{16000000000 0} {<nil>}  BinarySI},hugepages-1Gi: {{0 0} {<nil>} 0 DecimalSI},hugepages-2Mi: {{0 0} {<nil>} 0 DecimalSI},memory: {{16000000000 0} {<nil>}  BinarySI},pods: {{250 0} {<nil>} 250 DecimalSI},},Phase:Running,Conditions:[]NodeCondition{NodeCondition{Type:MemoryPressure,Status:False,LastHeartbeatTime:[date, time, and timezone redacted],LastTransitionTime:[date, time, and timezone redacted],Reason:KubeletHasSufficientMemory,Message:kubelet has sufficient memory available,},NodeCondition{Type:DiskPressure,Status:False,LastHeartbeatTime:[date, time, and timezone redacted],LastTransitionTime:[date, time, and timezone redacted],Reason:KubeletHasNoDiskPressure,Message:kubelet has no disk pressure,},NodeCondition{Type:PIDPressure,Status:False,LastHeartbeatTime:[date, time, and timezone redacted],LastTransitionTime:[date, time, and timezone redacted],Reason:KubeletHasSufficientPID,Message:kubelet has sufficient PID available,},NodeCondition{Type:Ready,Status:True,LastHeartbeatTime:[date, time, and timezone redacted],LastTransitionTime:[date, time, and timezone redacted],Reason:KubeletReady,Message:kubelet is posting ready status,},},Addresses:[]NodeAddress{NodeAddress{Type:ExternalIP,Address:172.17.0.2,},NodeAddress{Type:InternalIP,Address:172.17.0.2,},NodeAddress{Type:Hostname,Address:172.17.0.2,},},DaemonEndpoints:NodeDaemonEndpoints{KubeletEndpoint:DaemonEndpoint{Port:10250,},},NodeInfo:NodeSystemInfo{MachineID:,SystemUUID:,BootID:,KernelVersion:,OSImage:,ContainerRuntimeVersion:,KubeletVersion:,KubeProxyVersion:,OperatingSystem:,Architecture:,},Images:[]ContainerImage{ContainerImage{Names:[worker-node-1],SizeBytes:1,},},VolumesInUse:[],VolumesAttached:[]AttachedVolume{},Config:&NodeConfigStatus{Assigned:nil,Active:&NodeConfigSource{ConfigMap:&ConfigMapNodeConfigSource{Namespace:,Name:,UID:,ResourceVersion:,KubeletConfigKey:,},},LastKnownGood:nil,Error:,},RuntimeHandlers:[]NodeRuntimeHandler{},},}
I0513 [time redacted] 19744 dump.go:116] 
Logging kubelet events for node 172.17.0.2
I0513 [time redacted] 19744 dump.go:121] 
Logging pods the kubelet thinks is on node 172.17.0.2
I0513 [time redacted] 19744 dump.go:128]  started at <nil> (0+0 container statuses recorded)
I0513 [time redacted] 19744 kubelet_metrics.go:206] 
Latency metrics for node 172.17.0.2
I0513 [time redacted] 19744 kubectl_utils.go:109] Running kubectl logs on non-ready containers in kube-system
[FAILED] in [SynchronizedBeforeSuite] - k8s.io/kubernetes/test/e2e/e2e.go:232 @ [date and time redacted]
<< Timeline

[FAILED] Error waiting for all pods to be running and ready: Told to stop trying after 0.010s.
Unexpected final error while getting *pod.state: listing replication controllers in namespace kube-system: 
In [SynchronizedBeforeSuite] at: k8s.io/kubernetes/test/e2e/e2e.go:232 @ [date and time redacted]

Usually when I hit this error kubetest2 would give some indication of what was wrong... (ie Expected x pod replicas, y are Running and Ready, from: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/pod/wait.go#L200C36-L200C86)

The error seems to be from: https://github.com/kubernetes/kubernetes/blob/99a8a6fe258a456d51dddd3dec16d168d94e4eb1/test/e2e/framework/pod/wait.go#L143 but I implemented replication controllers with no results (it is possible I found the wrong error or I did not implement replication controllers properly though, since, for now, they only get created automatically when a Deployment is created).

Let me know what you think.

Also, fixing the output of the line below POD NODE PHASE GRACE CONDITIONS is on my to do list, but it is a low priority at the moment.

jayunit100 commented 4 months ago

Hi! I just run e2e.go. there's not really much advantage in using the wrappers I think ......

jayunit100 commented 4 months ago

This waiting for pods running and ready can be fixed by giving it the max unavailable nodes command line options .... I forgot the exact incantation, I can hunt around.

Megapixel99 commented 4 months ago

Hey!

I figured this out awhile back and was able to get the tests to run! I had trouble figuring out how to parse the protobuf data but I was able to get help here: https://github.com/kubernetes/kubernetes/issues/125201.

The branch I'm using for this is: resource-config (which is at this commit as of this comment)

Megapixel99 / nodejs-k8s

Run conformance tests #1