cloud-bulldozer / benchmark-operator

The Chuck Norris of cloud benchmarks
Apache License 2.0
282 stars 127 forks source link

zone affinity for uperf clients #811

Closed mukrishn closed 1 year ago

mukrishn commented 1 year ago

Description

Pod affinity to keep server and client within same zone to avoid external variability.

Fixes

rsevilla87 commented 1 year ago

Since this is required during scheduling, if the AZ only has a single node, will the client ever launch?

I think not since the policy is requiredDuringSchedulingIgnoredDuringExecution, I think we should add this policy as preferredDuringSchedulingIgnoredDuringExecution to prevent single node on AZ test cases to fail

mukrishn commented 1 year ago

This is keeping the server and client in the same zone, so why can't we enforce it by keeping requiredDuring instead of preferredDuring ? at all cases we will have a single zone isn't it.

rsevilla87 commented 1 year ago

This is keeping the server and client in the same zone, so why can't we enforce it by keeping requiredDuring instead of preferredDuring ? at all cases we will have a single zone isn't it.

In bare metal environments by default there's only one zone, for example

mukrishn commented 1 year ago

In bare metal environments by default there's only one zone, for example

Yes so server and client will use the same zone in that case.

jtaleric commented 1 year ago

Since this is required during scheduling, if the AZ only has a single node, will the client ever launch?

I think not since the policy is requiredDuringSchedulingIgnoredDuringExecution, I think we should add this policy as preferredDuringSchedulingIgnoredDuringExecution to prevent single node on AZ test cases to fail

Yeah, I think we need to move to preferred vs required.

smalleni commented 1 year ago

Since this is required during scheduling, if the AZ only has a single node, will the client ever launch?

I think not since the policy is requiredDuringSchedulingIgnoredDuringExecution, I think we should add this policy as preferredDuringSchedulingIgnoredDuringExecution to prevent single node on AZ test cases to fail

Yeah, I think we need to move to preferred vs required.

This PR is about requiring both server and client to be in the same AZ, not different. And since we will always have atleast 1 AZ, the required requiredDuringSchedulingIgnoredDuringExecution will not cause failures on single zone clusters.

mukrishn commented 1 year ago

Note: With this affinity we need at least 2 nodes in a single zone. so airflow jobs worker counts have to be raised to 6.

mukrishn commented 1 year ago

@smalleni just confirmed that baremetal nodes are not having this topology.kubernetes.io/zone label, so it might break BM runs. better to be preferred ?

jtaleric commented 1 year ago

@smalleni just confirmed that baremetal nodes are not having this topology.kubernetes.io/zone label, so it might break BM runs. better to be preferred ?

Or we apply the label prior to execution...

I ran into this running these scripts against non-ocp deployments too (EKS and GKE in particular).

stale[bot] commented 1 year ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

mukrishn commented 1 year ago

Or we apply the label prior to execution...

Not sure if adding labels would be an appropriate solution without knowing the placement of machines. Never played with zoning on baremetal node before, happy to investigate further if that is a way.

For time being lets make it preferred ?

rsevilla87 commented 1 year ago

Or we apply the label prior to execution...

Not sure if adding labels would be an appropriate solution without knowing the placement of machines. Never played with zoning on baremetal node before, happy to investigate further if that is a way.

For time being lets make it preferred ?

makes sense to me

codecov-commenter commented 1 year ago

Codecov Report

Merging #811 (10c03cd) into master (0954437) will not change coverage. The diff coverage is n/a.

:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@           Coverage Diff           @@
##           master     #811   +/-   ##
=======================================
  Coverage   53.17%   53.17%           
=======================================
  Files           8        8           
  Lines         331      331           
=======================================
  Hits          176      176           
  Misses        155      155           
Flag Coverage Δ
gha 53.17% <ø> (ø)
python-3.9 53.17% <ø> (ø)
unit 53.17% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.