determined-ai / determined

Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.
https://determined.ai
Apache License 2.0
2.97k stars 347 forks source link

🤔[question] How to get pod address by experiment #9464

Open fecet opened 2 months ago

fecet commented 2 months ago

Describe your question

Currently, the python api of client.get_experiment only return experiment info from yaml, but not include those dynamically determined by determined, such as pod address. Is there any method to get them from python or cli?

Checklist

ioga commented 2 months ago

hello. on k8s, det experiment pods have names with the prefix exp-$EXPERIMENT_ID-trial-$TRIAL_ID: https://github.com/determined-ai/determined/blob/main/master/pkg/tasks/task_trial.go#L106

so, you can take the experiment id, then use kubectl or k8s APIs to find the matching pods by this prefix and get their pod addresses or other k8s-specific data.