Mellanox / network-operator

Mellanox Network Operator
Apache License 2.0
185 stars 47 forks source link

Support probe timeout configuratoins #952

Open ivelichkovich opened 1 month ago

ivelichkovich commented 1 month ago

What would you like to be added:

Support for timeouts on probe configurations, specifically for ofed-driver

Why is this needed:

lsmod can take longer than a second depending on what you have running on the host

rollandf commented 1 month ago

The MOFED probe timeouts are already available to be customized via the NicClusterPolicy CR. See here: https://github.com/Mellanox/network-operator/blob/master/api/v1alpha1/nicclusterpolicy_types.go#L90-L94

Please tell me if that meets your requirement.

rollandf commented 1 month ago

@ivelichkovich BTW, I saw you have interest in IPAM solution with slicing per node. You can check https://github.com/Mellanox/nvidia-k8s-ipam that implements the same.

ivelichkovich commented 1 month ago

@ivelichkovich BTW, I saw you have interest in IPAM solution with slicing per node. You can check https://github.com/Mellanox/nvidia-k8s-ipam that implements the same.

Oh that's awesome, I'll explore that repo!

ivelichkovich commented 1 month ago

The MOFED probe timeouts are already available to be customized via the NicClusterPolicy CR. See here: https://github.com/Mellanox/network-operator/blob/master/api/v1alpha1/nicclusterpolicy_types.go#L90-L94

Please tell me if that meets your requirement.

so these do allow you to define the probes however PodProbeSpec only exposes these fields https://github.com/Mellanox/network-operator/blob/master/api/v1alpha1/nicclusterpolicy_types.go#L74 so having access to failureThreshold and timeoutSeconds would be nice to have. You could maybe just replace PodProbeSpec with the upstream probe object and then it would have 1:1 parity without needing to keep a copy of the struct in code and convert back/forth.