cloud-bulldozer / k8s-netperf

Running Networking Performance Tests against K8s
Apache License 2.0
28 stars 17 forks source link

Refactor drivers code #131

Closed rsevilla87 closed 6 months ago

rsevilla87 commented 7 months ago

Type of change

Description

Refactoring drivers code, all drivers code has been moved to the drivers package, and they implement the new Driver interface. Resulting in a more intuitive and reduced code.

Now is also possible to select what driver to use, by default netperf is enabled, but it's possible to run the test with different combinations of drivers, i.e:

$ k8s-netperf --iperf # Will run the test using iperf and netperf

or

$ k8s-netperf --netperf=false --uperf # Will run the test using uperf

Related Tickets & Documents

Checklist before requesting a review

jtaleric commented 6 months ago

Need to dig into this failure a bit more

panic: runtime error: index out of range [0] with length 0

goroutine 1 [running]:
main.executeWorkload({0x1, 0xa, {0xc00022d6f0, 0xa}, 0x3, 0x400, 0x0, {0x1ee68ee, 0x4}, 0x1}, ...)
  /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:358 +0xfc6
main.glob..func1(0xc000231900?, {0x1ee698e?, 0x4?, 0x1ee6992?})
  /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:187 +0x1065
github.com/spf13/cobra.(*Command).execute(0x3299a20, {0xc000132010, 0x1, 0x1})
  /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:920 +0x863
github.com/spf13/cobra.(*Command).ExecuteC(0x3299a20)
  /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044 +0x3a5
github.com/spf13/cobra.(*Command).Execute(...)
  /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968
main.main()
  /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:464 +0x3c7
jtaleric at polaris in ~/code/k8s-netperf on interface-refactorβ–³ $ git checkout main

I just ran w/ ./bin/amd64/k8s-netperf --debug

I patched the hostNet issues locally, but still running into issues :(

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x19fbb06]

goroutine 1 [running]:
main.executeWorkload({0x1, 0xa, {0xc00022d6f0, 0xa}, 0x3, 0x400, 0x0, {0x1ee68ee, 0x4}, 0x1}, ...)
    /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:398 +0x566
main.glob..func1(0xc000231900?, {0x1ee698e?, 0x4?, 0x1ee6992?})
    /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:187 +0x1065
github.com/spf13/cobra.(*Command).execute(0x3299a20, {0xc000132010, 0x1, 0x1})
    /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:920 +0x863
github.com/spf13/cobra.(*Command).ExecuteC(0x3299a20)
    /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044 +0x3a5
github.com/spf13/cobra.(*Command).Execute(...)
    /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968
main.main()
    /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:464 +0x3c7
rsevilla87 commented 6 months ago

Need to dig into this failure a bit more

panic: runtime error: index out of range [0] with length 0

goroutine 1 [running]:
main.executeWorkload({0x1, 0xa, {0xc00022d6f0, 0xa}, 0x3, 0x400, 0x0, {0x1ee68ee, 0x4}, 0x1}, ...)
    /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:358 +0xfc6
main.glob..func1(0xc000231900?, {0x1ee698e?, 0x4?, 0x1ee6992?})
    /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:187 +0x1065
github.com/spf13/cobra.(*Command).execute(0x3299a20, {0xc000132010, 0x1, 0x1})
    /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:920 +0x863
github.com/spf13/cobra.(*Command).ExecuteC(0x3299a20)
    /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044 +0x3a5
github.com/spf13/cobra.(*Command).Execute(...)
    /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968
main.main()
    /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:464 +0x3c7
jtaleric at polaris in ~/code/k8s-netperf on interface-refactorβ–³ $ git checkout main

I just ran w/ ./bin/amd64/k8s-netperf --debug

I patched the hostNet issues locally, but still running into issues :(

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x19fbb06]

goroutine 1 [running]:
main.executeWorkload({0x1, 0xa, {0xc00022d6f0, 0xa}, 0x3, 0x400, 0x0, {0x1ee68ee, 0x4}, 0x1}, ...)
  /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:398 +0x566
main.glob..func1(0xc000231900?, {0x1ee698e?, 0x4?, 0x1ee6992?})
  /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:187 +0x1065
github.com/spf13/cobra.(*Command).execute(0x3299a20, {0xc000132010, 0x1, 0x1})
  /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:920 +0x863
github.com/spf13/cobra.(*Command).ExecuteC(0x3299a20)
  /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044 +0x3a5
github.com/spf13/cobra.(*Command).Execute(...)
  /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968
main.main()
  /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:464 +0x3c7

Need to dig into this failure a bit more

panic: runtime error: index out of range [0] with length 0

goroutine 1 [running]:
main.executeWorkload({0x1, 0xa, {0xc00022d6f0, 0xa}, 0x3, 0x400, 0x0, {0x1ee68ee, 0x4}, 0x1}, ...)
    /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:358 +0xfc6
main.glob..func1(0xc000231900?, {0x1ee698e?, 0x4?, 0x1ee6992?})
    /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:187 +0x1065
github.com/spf13/cobra.(*Command).execute(0x3299a20, {0xc000132010, 0x1, 0x1})
    /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:920 +0x863
github.com/spf13/cobra.(*Command).ExecuteC(0x3299a20)
    /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044 +0x3a5
github.com/spf13/cobra.(*Command).Execute(...)
    /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968
main.main()
    /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:464 +0x3c7
jtaleric at polaris in ~/code/k8s-netperf on interface-refactorβ–³ $ git checkout main

I just ran w/ ./bin/amd64/k8s-netperf --debug

I patched the hostNet issues locally, but still running into issues :(

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x19fbb06]

goroutine 1 [running]:
main.executeWorkload({0x1, 0xa, {0xc00022d6f0, 0xa}, 0x3, 0x400, 0x0, {0x1ee68ee, 0x4}, 0x1}, ...)
  /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:398 +0x566
main.glob..func1(0xc000231900?, {0x1ee698e?, 0x4?, 0x1ee6992?})
  /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:187 +0x1065
github.com/spf13/cobra.(*Command).execute(0x3299a20, {0xc000132010, 0x1, 0x1})
  /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:920 +0x863
github.com/spf13/cobra.(*Command).ExecuteC(0x3299a20)
  /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044 +0x3a5
github.com/spf13/cobra.(*Command).Execute(...)
  /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968
main.main()
  /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:464 +0x3c7

I can't reproduce it, what was the exact command you used?

jtaleric commented 6 months ago

I can't reproduce it, what was the exact command you used?

weird.

All I am doing is ./bin/amd64/k8s-netperf --debug

and I see

DEBU[2024-02-19 08:53:34] client-across Running on perf-fc640-4.perf.lab.eng.rdu2.redhat.com with IP 10.1.184.207 
DEBU[2024-02-19 08:53:34] Executing workloads. hostNetwork is false, service is false 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x19fbb06]

goroutine 1 [running]:
main.executeWorkload({0x1, 0xa, {0xc00022d6f0, 0xa}, 0x3, 0x400, 0x0, {0x1ee68ee, 0x4}, 0x1}, ...)
    /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:398 +0x566
main.glob..func1(0xc000231900?, {0x1ee698e?, 0x4?, 0x1ee6992?})
    /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:187 +0x1065
github.com/spf13/cobra.(*Command).execute(0x3299a20, {0xc000132010, 0x1, 0x1})
    /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:920 +0x863
github.com/spf13/cobra.(*Command).ExecuteC(0x3299a20)
    /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044 +0x3a5
github.com/spf13/cobra.(*Command).Execute(...)
    /home/jtaleric/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968
main.main()
    /home/jtaleric/code/k8s-netperf/cmd/k8s-netperf/k8s-netperf.go:464 +0x3c7

Checking version

Version: interface-refactor
Git Commit: 9c61fac33db600342e36e9167e657aa240c2b8fd
Build Date: 2024-02-19-08:52:14
Go Version: go1.21.3
OS/Arch: linux amd64
jtaleric commented 6 months ago

https://github.com/cloud-bulldozer/k8s-netperf/blob/996f2d339ef0eea1f54a859f2ade72b679f1af3f/cmd/k8s-netperf/k8s-netperf.go#L187

                                       npr := executeWorkload(nc, s, true, false, false, false)

this should be

                                       npr := executeWorkload(nc, s, false, false, false, false)
jtaleric commented 6 months ago

https://github.com/cloud-bulldozer/k8s-netperf/blob/996f2d339ef0eea1f54a859f2ade72b679f1af3f/cmd/k8s-netperf/k8s-netperf.go#L187

                                       npr := executeWorkload(nc, s, true, false, false, false)

this should be

                                       npr := executeWorkload(nc, s, false, false, false, false)

Update...

The fields are. executeWorkload(networkConfig, scenario, hostNetwork, netperfWorkload, iperfWorkload, uperfWorkload)

So, with this, the update to line - https://github.com/cloud-bulldozer/k8s-netperf/blob/996f2d339ef0eea1f54a859f2ade72b679f1af3f/cmd/k8s-netperf/k8s-netperf.go#L187

npr := executeWorkload(nc, s, false, true, false, false)

jtaleric commented 6 months ago

πŸ‘€

rsevilla87 commented 6 months ago

πŸ‘€

alright, it should be ready now

jtaleric commented 6 months ago

hm, just ran into this...

./bin/amd64/k8s-netperf
INFO[2024-02-21 17:30:28] πŸ“’ Reading netperf.yml file.                  
INFO[2024-02-21 17:30:28] πŸ“’ Reading netperf.yml file - using ConfigV2 Method.  
INFO[2024-02-21 17:30:28] Cleaning resources created by k8s-netperf    
INFO[2024-02-21 17:30:29] ⏰ Waiting for client-across Deployment to deleted... 
INFO[2024-02-21 17:31:00] ⏰ Waiting for server Deployment to deleted... 
INFO[2024-02-21 17:31:31] πŸ”¬ prometheus discovered at openshift-monitoring 
WARN[2024-02-21 17:31:31] ⚠️  No zone label                            
WARN[2024-02-21 17:31:31] ⚠️  Single node per zone and/or no zone labels 
INFO[2024-02-21 17:31:31] πŸš€ Creating service for iperf-service in namespace netperf 
INFO[2024-02-21 17:31:31] πŸš€ Creating service for uperf-service in namespace netperf 
INFO[2024-02-21 17:31:31] πŸš€ Creating service for netperf-service in namespace netperf 
INFO[2024-02-21 17:31:31] πŸš€ Starting Deployment for: client-across in namespace: netperf 
INFO[2024-02-21 17:31:31] ⏰ Checking for client-across Pods to become ready... 
INFO[2024-02-21 17:31:34] πŸš€ Starting Deployment for: server in namespace: netperf 
INFO[2024-02-21 17:31:34] ⏰ Checking for server Pods to become ready... 
INFO[2024-02-21 17:31:38] πŸ—’οΈ  Running netperf TCP_STREAM (service false) for 10s  
INFO[2024-02-21 17:31:51] πŸ—’οΈ  Running netperf TCP_STREAM (service false) for 10s  
INFO[2024-02-21 17:32:03] πŸ—’οΈ  Running netperf TCP_STREAM (service false) for 10s  
FATA[2024-02-21 17:32:15] 😭 At least one driver needs to be enabled   

We must need to set netperf to true by default.

jtaleric commented 6 months ago

https://github.com/cloud-bulldozer/k8s-netperf/pull/131/files#diff-2f4a29be1c0731286fb79f835c6fab35e1993e5d96990fc51e93953bfeeb8323R67

and i see we default it to true here - https://github.com/cloud-bulldozer/k8s-netperf/pull/131/files#diff-2f4a29be1c0731286fb79f835c6fab35e1993e5d96990fc51e93953bfeeb8323R431

I think this might be the Cobra bug wrt bools...

jtaleric commented 6 months ago

Nope, this is even weirder... we see it ran some tcp stream tests in the output i shared πŸ˜•

rsevilla87 commented 6 months ago

I can't reproduce it:

$ ./bin/amd64/k8s-netperf 
INFO[2024-02-21 23:56:41] πŸ“’ Reading netperf.yml file.                  
INFO[2024-02-21 23:56:41] πŸ“’ Reading netperf.yml file - using ConfigV2 Method.  
INFO[2024-02-21 23:56:41] Cleaning resources created by k8s-netperf    
INFO[2024-02-21 23:56:42] ⏰ Waiting for client-across Deployment to deleted... 
INFO[2024-02-21 23:56:54] ⏰ Waiting for server Deployment to deleted... 
INFO[2024-02-21 23:56:58] πŸ”¬ prometheus discovered at openshift-monitoring 
WARN[2024-02-21 23:57:00] ⚠️   Single node per zone and/or no zone labels 
INFO[2024-02-21 23:57:01] πŸš€ Creating service for iperf-service in namespace netperf 
INFO[2024-02-21 23:57:01] πŸš€ Creating service for uperf-service in namespace netperf 
INFO[2024-02-21 23:57:02] πŸš€ Creating service for netperf-service in namespace netperf 
INFO[2024-02-21 23:57:02] πŸš€ Starting Deployment for: client-across in namespace: netperf 
INFO[2024-02-21 23:57:02] ⏰ Checking for client-across Pods to become ready... 
INFO[2024-02-21 23:57:05] πŸš€ Starting Deployment for: server in namespace: netperf 
INFO[2024-02-21 23:57:05] ⏰ Checking for server Pods to become ready... 
INFO[2024-02-21 23:57:11] πŸ—’οΈ   Running netperf TCP_STREAM (service false) for 10s  
INFO[2024-02-21 23:57:25] πŸ—’οΈ   Running netperf TCP_STREAM (service false) for 10s  
INFO[2024-02-21 23:57:38] πŸ—’οΈ   Running netperf TCP_STREAM (service false) for 10s  
INFO[2024-02-21 23:57:53] πŸ—’οΈ   Running netperf TCP_STREAM (service false) for 10s  
INFO[2024-02-21 23:58:06] πŸ—’οΈ   Running netperf TCP_STREAM (service false) for 10s  
INFO[2024-02-21 23:58:20] πŸ—’οΈ   Running netperf TCP_STREAM (service false) for 10s  
INFO[2024-02-21 23:58:34] πŸ—’οΈ   Running netperf UDP_STREAM (service false) for 10s  
INFO[2024-02-21 23:58:47] πŸ—’οΈ   Running netperf UDP_STREAM (service false) for 10s  
INFO[2024-02-21 23:59:00] πŸ—’οΈ   Running netperf UDP_STREAM (service false) for 10s  
INFO[2024-02-21 23:59:15] πŸ—’οΈ   Running netperf TCP_CRR (service false) for 10s  
INFO[2024-02-21 23:59:30] πŸ—’οΈ   Running netperf TCP_CRR (service false) for 10s  
INFO[2024-02-21 23:59:45] πŸ—’οΈ   Running netperf TCP_CRR (service false) for 10s  
INFO[2024-02-22 00:00:01] πŸ—’οΈ   Running netperf TCP_CRR (service true) for 10s  
INFO[2024-02-22 00:00:17] πŸ—’οΈ   Running netperf TCP_CRR (service true) for 10s  
INFO[2024-02-22 00:00:32] πŸ—’οΈ   Running netperf TCP_CRR (service true) for 10s  
INFO[2024-02-22 00:00:49] πŸ—’οΈ   Running netperf TCP_RR (service false) for 10s  
INFO[2024-02-22 00:01:02] πŸ—’οΈ   Running netperf TCP_RR (service false) for 10s  
INFO[2024-02-22 00:01:16] πŸ—’οΈ   Running netperf TCP_RR (service false) for 10s  
blablabla
INFO[2024-02-22 00:01:38] Cleaning resources created by k8s-netperf    
INFO[2024-02-22 00:01:39] ⏰ Waiting for client-across Deployment to deleted... 
INFO[2024-02-22 00:02:10] ⏰ Waiting for server Deployment to deleted...