determined-ai / determined

Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.
https://determined.ai
Apache License 2.0
2.93k stars 347 forks source link

feat: continue trial from WebUI for multi-trial experiment #9589

Closed gt2345 closed 1 week ago

gt2345 commented 3 weeks ago

Ticket

MD-437

Description

Allow user to continue multi-trial experiments if applicable. Multi-trial experiments that are able to continue must satisfy the following criteria:

Test Plan

Navigate the the experiment details page of a multi trial experiment, if the experiment is not able to continue, then the Continue Trial button should not be available. Find a multi-trial experiment with Continue Trial button, clicking on the button will continue the experiment.

https://github.com/determined-ai/determined/assets/40620519/d17325ed-96f1-4170-a9df-c090cc5669a2

Checklist

netlify[bot] commented 3 weeks ago

Deploy Preview for determined-ui ready!

Name Link
Latest commit 2082ce58b38f327a4fb57e9f0ed514d137f09ba5
Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/668d8835afda170009a2c9e8
Deploy Preview https://deploy-preview-9589--determined-ui.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

codecov[bot] commented 3 weeks ago

Codecov Report

Attention: Patch coverage is 58.69565% with 19 lines in your changes missing coverage. Please review.

Project coverage is 45.21%. Comparing base (c40b861) to head (2082ce5). Report is 42 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #9589 +/- ## ========================================== - Coverage 49.82% 45.21% -4.61% ========================================== Files 1247 923 -324 Lines 162287 121992 -40295 Branches 2888 2893 +5 ========================================== - Hits 80855 55160 -25695 + Misses 81260 66660 -14600 Partials 172 172 ``` | [Flag](https://app.codecov.io/gh/determined-ai/determined/pull/9589/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai) | Coverage Δ | | |---|---|---| | [harness](https://app.codecov.io/gh/determined-ai/determined/pull/9589/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai) | `?` | | | [web](https://app.codecov.io/gh/determined-ai/determined/pull/9589/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai) | `46.16% <58.69%> (+<0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files](https://app.codecov.io/gh/determined-ai/determined/pull/9589?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai) | Coverage Δ | | |---|---|---| | [webui/react/src/types.ts](https://app.codecov.io/gh/determined-ai/determined/pull/9589?src=pr&el=tree&filepath=webui%2Freact%2Fsrc%2Ftypes.ts&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai#diff-d2VidWkvcmVhY3Qvc3JjL3R5cGVzLnRz) | `99.68% <100.00%> (+<0.01%)` | :arrow_up: | | [webui/react/src/utils/experiment.ts](https://app.codecov.io/gh/determined-ai/determined/pull/9589?src=pr&el=tree&filepath=webui%2Freact%2Fsrc%2Futils%2Fexperiment.ts&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai#diff-d2VidWkvcmVhY3Qvc3JjL3V0aWxzL2V4cGVyaW1lbnQudHM=) | `83.88% <92.30%> (+0.08%)` | :arrow_up: | | [...ages/ExperimentDetails/ExperimentDetailsHeader.tsx](https://app.codecov.io/gh/determined-ai/determined/pull/9589?src=pr&el=tree&filepath=webui%2Freact%2Fsrc%2Fpages%2FExperimentDetails%2FExperimentDetailsHeader.tsx&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai#diff-d2VidWkvcmVhY3Qvc3JjL3BhZ2VzL0V4cGVyaW1lbnREZXRhaWxzL0V4cGVyaW1lbnREZXRhaWxzSGVhZGVyLnRzeA==) | `73.62% <35.71%> (-1.35%)` | :arrow_down: | ... and [324 files with indirect coverage changes](https://app.codecov.io/gh/determined-ai/determined/pull/9589/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai)
gt2345 commented 2 weeks ago

i think its working correctly, but let me try to understand it more. looks like this experiment (before this change) doesnt show continue button even though its a multi experiment with ASHA and cancelled. do you have an example how it works right now?

Hi @keita-determined not sure I completely understand your concerns, but multi experiments are only able to continue if the searcher is grid or random, so for ASHA experiments this feature won't be available