we currently benchmark the operator to reflect real situration where tens of jobs come up concurrently.
we observe an issue where the submitters pods reach each one consume 1000m+ cpu. since there are no resources what so ever for the submitter k8s job, they are filling the nodes and easily making the nodes choke. since we dont control the submitter nodeselector this also influences the JobManagers on these nodes since they dont come up since the node is on 100% because of the submitters (we currently solve this with taints and tolerations for the other workloads) but it would really make it easy to control the submitters requests,limits and nodeselectors.
we currently benchmark the operator to reflect real situration where tens of jobs come up concurrently. we observe an issue where the submitters pods reach each one consume 1000m+ cpu. since there are no resources what so ever for the submitter k8s job, they are filling the nodes and easily making the nodes choke. since we dont control the submitter nodeselector this also influences the JobManagers on these nodes since they dont come up since the node is on 100% because of the submitters (we currently solve this with taints and tolerations for the other workloads) but it would really make it easy to control the submitters requests,limits and nodeselectors.