determined-ai / determined

Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.
https://determined.ai
Apache License 2.0
3k stars 350 forks source link

feat: switch default rsa_key_size to 2048. #9612

Closed ioga closed 3 months ago

ioga commented 3 months ago

Description

It seems minimum recommended RSA key size on RHEL are 2048: https://access.redhat.com/articles/3642912 similar problem came up in the past for an openshift customer: https://hpe-aiatscale.atlassian.net/browse/DET-5983

I have a suspicion we're seeing it again for a fedora user: https://determined-community.slack.com/archives/CV3MTNZ6U/p1720179622244169

we elected not to do it in the past over performance concerns: https://github.com/determined-ai/determined/pull/2966#issuecomment-921397606 but now we GenerateKey once per experiment, not per trial.

This'll slow down new experiment and shell creation by ~150 ms because the key generation takes longer.

Test Plan

  1. det shell start on a cluster with the default settings
  2. Verify the bit strength is 2048: openssl rsa -text -in /run/determined/ssh/id_rsa | head -n1

Checklist

netlify[bot] commented 3 months ago

Deploy Preview for determined-ui canceled.

Name Link
Latest commit 14ea8b016288be81168b1a92f3c6394896b5dbaa
Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/6688327dfc4a0d00085a6392
codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 52.88%. Comparing base (0a57cde) to head (14ea8b0).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #9612 +/- ## ========================================== - Coverage 52.89% 52.88% -0.01% ========================================== Files 1255 1255 Lines 153086 153086 Branches 3230 3229 -1 ========================================== - Hits 80971 80964 -7 - Misses 71964 71971 +7 Partials 151 151 ``` | [Flag](https://app.codecov.io/gh/determined-ai/determined/pull/9612/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai) | Coverage Δ | | |---|---|---| | [backend](https://app.codecov.io/gh/determined-ai/determined/pull/9612/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai) | `44.00% <100.00%> (-0.02%)` | :arrow_down: | | [harness](https://app.codecov.io/gh/determined-ai/determined/pull/9612/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai) | `72.76% <ø> (ø)` | | | [web](https://app.codecov.io/gh/determined-ai/determined/pull/9612/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai) | `51.30% <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files](https://app.codecov.io/gh/determined-ai/determined/pull/9612?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai) | Coverage Δ | | |---|---|---| | [master/internal/config/config.go](https://app.codecov.io/gh/determined-ai/determined/pull/9612?src=pr&el=tree&filepath=master%2Finternal%2Fconfig%2Fconfig.go&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai#diff-bWFzdGVyL2ludGVybmFsL2NvbmZpZy9jb25maWcuZ28=) | `71.37% <100.00%> (ø)` | | ... and [2 files with indirect coverage changes](https://app.codecov.io/gh/determined-ai/determined/pull/9612/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=determined-ai)
ioga commented 3 months ago

discussed offline, and we don't want to take the performance hit. we need a better way to generate the keys (in the background, reuse keys, ...) to alleviate the performance issues.