Closed trevormcguire closed 1 week ago
Thank you for opening your first issue here! 🛠
To address the issue where Flytepropeller's plugin collector fails to list objects due to missing group names in the apiVersion
, you can modify the ResourceLevelMonitor
to include the group names when searching for resources. This can be done by updating the collect
function in the plugin collector code.
Here's an example of how you can modify the collect
function to include the group names:
func (r *ResourceLevelMonitor) collect(ctx context.Context) {
list := metav1.PartialObjectMetadataList{
TypeMeta: metav1.TypeMeta{
Kind: r.gvk.Kind,
APIVersion: fmt.Sprintf("%s/%s", r.gvk.Group, r.gvk.Version),
},
}
if err := r.cache.List(ctx, &list); err != nil {
logger.Warnf(ctx, "failed to list objects for %s.%s: %v", r.gvk.Kind, r.gvk.Version, err)
return
}
// ... rest of the function
}
Would you like more details on how to implement this fix?
Sources:
This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.
Mention @runllm to ask follow-up questions. See runllm.com for more!
@trevormcguire would you be willing to submit a PR to correctly set the API version?
@trevormcguire would you be willing to submit a PR to correctly set the API version?
@hamersaw sure, will work on that
Describe the bug
Flytepropeller tracks how many objects there for each resource type (such as TfJob, PyTorchjob, etc). However, the plugin collector go client does not include group names in the apiVerison when searching for these resources (such as kubeflow/v1), so a warning is raised
failed to list objects for
when attempting to list them.https://github.com/flyteorg/flyte/blob/75b33f80dacb737a098b223ad24c2fee49351e31/flytepropeller/pkg/controller/nodes/task/k8s/plugin_collector.go#L52-L61
This seems to have been introduced when we were testing out 1.12 and can be traced back to #5238
Here is an example for SparkApplication.v1beta2
Expected behavior
Resource Manager is able to correctly list custom resources
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?