CliMA / ClimaAtmos.jl

ClimaAtmos.jl is a library for building atmospheric circulation models that is designed from the outset to leverage data assimilation and machine learning tools. We welcome contributions!
Apache License 2.0
85 stars 19 forks source link

Run `ClimaAtmos` on `Google Cloud` #3026

Open sriharshakandala opened 6 months ago

sriharshakandala commented 6 months ago

The Climate Modeling Alliance

Software Design Issue 📜

Purpose

Setup an automated build system to deploy ClimaAtmos.jl on the google cloud.

Cost/Benefits/Risks

Benefits:

People and Personnel

Components

Results and Deliverables

Task Breakdown And Schedule

SDI Revision Log

CC

@tapios @simonbyrne @cmbengue

### Tasks
- [x] Install/configure and run `ClimaAtmos.jl` on a single GPU on GCP
- [x] Install/configure SLURM for single node and multi-node GCP configurations
- [x] Setup environment for CliMA software stack
- [ ] Compile strong scaling data on H100 GPUs
- [ ] Add curated documentation to ClimaDocs
Sbozzolo commented 6 months ago

Would it be possible to have more details on this SDI?

I am interested in potential sinergies to improve the resiliance of our CI infrastructure

charleskawczynski commented 6 months ago

I am interested in potential sinergies to improve the resiliance of our CI infrastructure

We're still in the process of adding details. I think we'll want to keep this decoupled, to start with, from the Central CI infra, as the central configuration is more static than what we'll be doing on GCP.

Sbozzolo commented 6 months ago

I am interested in potential sinergies to improve the resiliance of our CI infrastructure

We're still in the process of adding details. I think we'll want to keep this decoupled, to start with, from the Central CI infra, as the central configuration is more static than what we'll be doing on GCP.

Okay, thank you! I am particularly interested in systems to check/ensure that the fastest trasport protocol is used for MPI communication with GPUs, which is something that needs to be configured and checked on every machine and with different combinations of libraries. If you build something to check that you are doing that on GCP, I think that the same solution might be adapted more generally. So far, I had to resort manual labor.

Also, I set up and configured spack when we first got clima. Eventually, we decided to not use it because Scott installed the libraries we needed systemwide with the correct build options.

I am attaching the environment I set up at the time in case you might find useful. (This is several months old, so it might not be relevant)

spack.yml.txt

Sbozzolo commented 5 months ago

Leaving this question for when time comes:

How are you planning on handling larger input files (as the ones in ClimaArtifacts)?