determined-ai / determined

Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.
https://determined.ai
Apache License 2.0
3k stars 350 forks source link

fix: allocation csv: gpu_hours -> slot_hours, add resource_pool [DET-10408] #9616

Closed jesse-amano-hpe closed 2 months ago

jesse-amano-hpe commented 3 months ago

Waiting on acceptance or response to https://hpe-aiatscale.atlassian.net/browse/DET-10408?focusedCommentId=32936

Ticket

DET-10408

Description

Renames the gpu_hours column in the allocation report CSV to slot_hours as this is a more accurate description. Adds a resource_pool column to help users identify what slot type(s) these slot-hours apply to.

Test Plan

Run any number of experiments >0 on a cluster that has at least one slot configured. On cluster historical usage tab, make sure the end date is current, and click Download CSV.

image

Choose to group by Allocations and download the CSV

image

Observe slot_hours and resource_pool columns are present in the download

image

Checklist

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 0% with 8 lines in your changes missing coverage. Please review.

Project coverage is 52.88%. Comparing base (000c679) to head (06f0f61). Report is 24 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #9616 +/- ## ========================================== - Coverage 52.88% 52.88% -0.01% ========================================== Files 1255 1255 Lines 153086 153090 +4 Branches 3230 3230 ========================================== - Hits 80965 80964 -1 - Misses 71970 71975 +5 Partials 151 151 ``` | [Flag](https://app.codecov.io/gh/determined-ai/determined/pull/9616/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai) | Coverage Δ | | |---|---|---| | [backend](https://app.codecov.io/gh/determined-ai/determined/pull/9616/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai) | `44.00% <0.00%> (-0.01%)` | :arrow_down: | | [harness](https://app.codecov.io/gh/determined-ai/determined/pull/9616/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai) | `72.76% <ø> (ø)` | | | [web](https://app.codecov.io/gh/determined-ai/determined/pull/9616/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai) | `51.30% <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files](https://app.codecov.io/gh/determined-ai/determined/pull/9616?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai) | Coverage Δ | | |---|---|---| | [master/internal/core.go](https://app.codecov.io/gh/determined-ai/determined/pull/9616?src=pr&el=tree&filepath=master%2Finternal%2Fcore.go&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai#diff-bWFzdGVyL2ludGVybmFsL2NvcmUuZ28=) | `6.48% <0.00%> (-0.04%)` | :arrow_down: | ... and [3 files with indirect coverage changes](https://app.codecov.io/gh/determined-ai/determined/pull/9616/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai)
netlify[bot] commented 3 months ago

Deploy Preview for determined-ui canceled.

Name Link
Latest commit 06f0f613de76c02079a905b97fc946e7d139d568
Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/669172ab9825630007fbfdac
jesse-amano-hpe commented 2 months ago

CE/SE question would be if this could break downstream integrations on that column name but since it's not accurate that's okay? alternative would be to make this a pure addition and avoid removing the "gpu_hours" column in the same release.

We discussed on Slack with @hkumar92 ; I'll add mention of this to release notes but no further mitigation should be needed.