broadinstitute / monorepo

Compendium of tools for the Imaging Platform
9 stars 1 forks source link

Server Configuration Evaluation Plan #42

Open shntnu opened 5 months ago

shntnu commented 5 months ago

Server Configuration Evaluation Plan

Objective

To determine the best operating system for our new servers by comparing nixOS and Ubuntu, focusing on ease of maintenance, reproducibility, and overall efficiency.

Hypothesis

nixOS will be easier to maintain than Ubuntu, even for individuals with only Ubuntu experience.

Advantages of nixOS

nixOS offers reproducibility through code-configured setups, avoids dependency conflicts, and allows quick deployment of standard tech stacks for experienced users. It enables optimized configurations to be reused across multiple environments, enhances security through declarative configurations, and provides a high degree of customization and flexibility in environment setup.

Disadvantages of nixOS

nixOS has a steep learning curve requiring understanding of unique configuration language and concepts, may need custom solutions for specific requirements, and is less familiar to most IT professionals compared to alternatives like Ubuntu. It also has limited troubleshooting resources due to a smaller community and necessitates specialized skills and knowledge to manage effectively, with initial configuration potentially being time-consuming for new users.

After visiting the data center on Oct 17, 2024, we noted that we'd be entirely dependent on BITS giving us access to enter. This may pose a significant hurdle, because unlike with Ubuntu, which can be entirely supported by BITS, nix would need us to troubleshoot ourselves, and access delays will delay research.

Testing and Evaluation Plan

Setup Period (3 weeks)

We will have a 3-week setup period to configure the servers to a usable state for Ank and Alán. During this time, we will focus on meeting the support benchmark, which includes ensuring the following tools and libraries are installed and operational:

OS: awscli, GCP cli, coreutils, gawk, git, gnused, gnutar, podman/docker, (neo)vim, Emacs, VScode server

R: Base libraries (stats, utils, etc), tidyverse, Bioconductor, Seurat, Data.table, RSQLite, Arrow, Jsonlite, rhdf5, Caret, Limma, edgeR, DESeq2, fgsea, ggplot2

Python: pytorch, rdkit, scikit-learn, Jupyter-lab, Poetry, conda, polars, cupy

Julia: DifferentialEquations

CUDA: Support for the CUDA landscape as per NVIDIA documentation

Onboarding Period (3 weeks)

Following the setup, there will be a 3-week onboarding period where the C-S lab team will be fully committed to testing nixOS and making it work for them. The user experience benchmark will include setting up personal development environments, ensuring developer tools are installable, testing packages in isolated shells, creating environments with specific CUDA versions, installing specific package versions, and understanding escape hatches like creating an Ubuntu container.

Voting and Reversion Plan

After the onboarding period, the team will vote on whether to continue with nixOS. If both the support and user experience benchmarks pass, we will proceed with nixOS. If not, we will revert to Ubuntu for the servers, transitioning one at a time.