canonical / katib-operators

Operators for Katib which is part of Charmed Kubeflow.
Apache License 2.0
1 stars 3 forks source link

katib-controller breaks using OCI image v0.16.0-rc.1 #150

Closed gustavosr98 closed 7 months ago

gustavosr98 commented 11 months ago

Bug Description

katib-controller breaks using OCI image v0.16.0-rc.1

To Reproduce

# juju download katib-controller --channel 0.15/edge
Fetching charm "katib-controller" using "0.15/edge" channel and base "amd64/ubuntu/20.04"
Install the "katib-controller" charm with:
    juju deploy ./katib-controller_564a127.charm

# OCI image ref from https://github.com/canonical/katib-operators/blob/main/charms/katib-controller/metadata.yaml#L17

# juju deploy --trust --debug ./katib-controller_564a127.charm --resource oci-image=$AWS_ECR_URL/kubeflowkatib/katib-controller:v0.16.0-rc.1     --config custom_images='{"default_trial_template": "701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflowkatib/mxnet-mnist:v0.16.0-rc.1", "early_stopping__medianstop": "701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflowkatib/earlystopping-medianstop:v0.16.0-rc.1", "enas_cpu_template": "701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflowkatib/enas-cnn-cifar10-cpu:v0.16.0-rc.1", "metrics_collector_sidecar__stdout": "701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflowkatib/file-metrics-collector:v0.16.0-rc.1", "metrics_collector_sidecar__file": "701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflowkatib/file-metrics-collector:v0.16.0-rc.1", "metrics_collector_sidecar__tensorflow_event": "701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflowkatib/tfevent-metrics-collector:v0.16.0-rc.1", "pytorch_job_template__master": "701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflowkatib/pytorch-mnist-cpu:v0.16.0-rc.1", "pytorch_job_template__worker": "701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflowkatib/pytorch-mnist-cpu:v0.16.0-rc.1", "suggestion__random": "701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflowkatib/suggestion-hyperopt:v0.16.0-rc.1", "suggestion__tpe": "701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflowkatib/suggestion-hyperopt:v0.16.0-rc.1", "suggestion__grid": "701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflowkatib/suggestion-optuna:v0.16.0-rc.1", "suggestion__hyperband": "701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflowkatib/suggestion-hyperband:v0.16.0-rc.1", "suggestion__bayesianoptimization": "701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflowkatib/suggestion-skopt:v0.16.0-rc.1", "suggestion__cmaes": "701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflowkatib/suggestion-goptuna:v0.16.0-rc.1", "suggestion__sobol": "701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflowkatib/suggestion-goptuna:v0.16.0-rc.1", "suggestion__multivariate_tpe": "701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflowkatib/suggestion-optuna:v0.16.0-rc.1", "suggestion__enas": "701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflowkatib/suggestion-enas:v0.16.0-rc.1", "suggestion__darts": "701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflowkatib/suggestion-darts:v0.16.0-rc.1", "suggestion__pbt": "701143232170.dkr.ecr.eu-west-1.amazonaws.com/kubeflowkatib/suggestion-pbt:v0.16.0-rc.1", }'

Environment

Juju 2.9.45 EKS 1.25 Katib 0.15/edge OCI images: v0.16.0-rc.1

Relevant Log Output

Logs on reproduce section

Additional Context

No response

orfeas-k commented 10 months ago

Hi @gustavosr98 , do you have any logs for this? I see the note but I do not see any logs in the To Reproduce section.

NohaIhab commented 10 months ago

Hi @gustavosr98 I see you're deploying katib-controller charm from channel 0.15/edge wtih image version v0.16.0-rc.1 This is a combination we don't typically test in our charms, is there any reason you needed the newer image with the 0.15 charm? I suggest that you use the charm from channel 0.16/stable, it is now released with Kubeflow 1.8.

DnPlas commented 7 months ago

I will mark this issue as stale and close it after receiving no reply from @gustavosr98. The information provided in the previous comment is also helpful and accurate. Feel free to re-open if needed.