RHEL8 upgrade - Githubissues

bodgerer commented 2 years ago

The user documentation needs to be updated for the RHEL 8 upgrade. The process will broadly be:

1) Users test RHEL8 image 2) Upgrade both login nodes to RHEL8 3) Migrate compute nodes from RHEL7 to RHEL8 as load permits

Changes made so far:

new "login8" command available on login nodes, providing an interactive ssh session on a RHEL 8 node for building software, submitting jobs, etc. (useful until the login nodes have been upgraded)
new "login7" command available on login nodes, providing an interactive ssh session on a RHEL 7 node for building software, submitting jobs, etc. (useful after login nodes have been upgraded)
new slurm partition "gpu8", containing 2x V100 nodes running RHEL 8. Use by adding following line to script: "#SBATCH --partition gpu8"
new slurm partition "infer8", containing 1x T4 node running RHEL 8. Use by adding following line to script: "#SBATCH --partition infer8"

Significant points of note:

Upgrading will allow Bede to support CUDA 11 and more recent versions of software
The "module" command does not work inside a job, if submitted on RHEL 7 but running on RHEL 8 (or submitted on RHEL 8 and running on RHEL 7). This is due to different versions of bash using different methods to resolve against the "Shellshock" vulnerability (2014). We recommend submitting the job on a node with the same operating system that the job will run on
Old software environments not supported on the RHEL 8 version of Bede:
- The vendor-supplied set of modules
- Easybuild
User-installed programs, particularly MPI programs, will likely need to be recompiled.
Multi-node IBM wmlce is not supported on RHEL 8

ptheywood commented 2 years ago

Just a few points to clarify:

Are there any (rough) timeframes for the various stages of migration that would be worth documenting?
When the login nodes have been migrated to RHEL 8, will the gpu/infer partitions refer to RHEL 7 or RHEL 8? (I presume 7)
When the migration to RHEL 8 has bee completed for all compute nodes, will the gpu/infer partitions then become RHEL 8? Will the gpu8/infer8 partitions then be removed (after some duration) or will they persist alongside the gpu/infer partitions?

We probably want to:

Add a new page which details the (current) state of the RHEL 7/8 migration
Add RHEL8/7 subsections to each page of the docs where relevant (i.e. module versions, which packages are available) which also mention the current state. Using an ..include is probably a decent way to do this.
Potentially expand several points in the usage section of the documentation.

bodgerer commented 2 years ago

Hi Peter,

Many thanks for taking a look at this : )

I don't have rough timeframes, or at least not to put on the website. I figured we'd take a view at each stage and advertise via the mailing list, giving appropriate notice.

Your questions about the gpu/infer partitions are good ones. For (2), they'd still need to refer to RHEL7 as the queue would still be full of RHEL7 jobs. For (3), it'd be nice to the process with RHEL8 referring to gpu/infer. I've been thinking about this and we could do it in a couple of ways:

Method A:

a) Users have to change from using gpu/infer to gpu8/infer8 in their submit scripts b) Wait until all nodes have been migrated to RHEL8 and the gpu/infer queue is empty c) We add the gpu/infer queues to the RHEL8 equipment (as well as gpu8/infer8) d) Users advised to change from using gpu8/infer8 to gpu/infer e) New jobs submitting to gpu8/infer8 are automatically rewritten to be redirected to gpu/infer f) Retire gpu8/infer8 once empty.

Method B (needs testing):

a) We retire the gpu8/infer8 partitions now and they take no further part in this. b) We configure slurm such that there's a "rhel7" label on the rhel7 nodes and "rhel8" on the rhel8 nodes, via its "features" setting. c) We make jobs submitted from a RHEL7 node run only on RHEL7 nodes, and jobs submitted from a RHEL8 node run only on RHEL8 nodes (by setting environment variable SBATCH_CONSTRAINT to rhel7 or rhel8 at login at the system level) e) A "rhel7" constraint is added to existing jobs f) We add the RHEL8 testing nodes to gpu/infer once we're sure everyone's logged out/in again to pick up the new variable.

Method B has the advantes that there's no change required to submission scripts, we avoid users hitting the "Shellshock" problem mentioned in my original email, and there's nothing to tidy up once we retire the RHEL7 environment.

There is a risk that some people are already setting SBATCH_CONSTRAINT themselves, but it doesn't seem likely to me.

Which do you think is better?

Best,

Mark

bodgerer commented 2 years ago

Hi Peter,

I've tested method B above, and it seems to work well.

If you login to Bede now, you'll see that the rhel8 nodes (gpu013, gpu014, infer003) are all in the main gpu/infer queues (as well as the gpu8/infer8 queues which we need to retire).

When submitting gpu/infer jobs from a rhel7 node, they will only run on the rhel7 nodes. Likewise, submitting from a rhel8 node will only run on the rhel8 nodes.

Hopefully this will make the documentation somewhat simpler?

Best,

Mark

ptheywood commented 2 years ago

Yep that works as described for me with some trivial test jobs, and should make the migration a little simpler.

Users may still need to change their job submission scripts however for different modules names (i.e. mvapich2/2.3.5 vs mvapich2/2.3.5-2, or singularity not currenlty being available on RHEL8), but that should be easy enough to highlight on the docs and it should be an obvious failure.

I'll get this added to the docs

bodgerer commented 2 years ago

Spoke over zoom about various things but, in summary:

singularity is installed as an rpm on rhel8, not as a separate module
mvapich2/2.3.5 has a major fault, so everyone would benefit from using mvapich2/2.3.5-2 on both rhel7 and rhel8

Thanks!

ptheywood commented 2 years ago

Now that #67 is merged and live on RTD, most software pages should list which module loads are (currently) available for RHEL 8 and RHEL 7 (although I may have missed some).

ptheywood commented 2 years ago

Once login node migration has occurred (provisionally 2022-02-10/11) #119 can be merged to update the migration status.

I have intentionally not listed the size of RHEL 8 / RHEL 7 partitions to avoid stale information being present in the documentation as this will be a moving target (initially a 7:25 split for gpu and 2:2 for infer i think). Commands are provided to check from the cluster.

The remaining steps for the RHEL migration documentation after this will be:

[ ] When all compute nodes have been migrated to RHEL 8, update the status page to mark it as completed, and update the status :note: to indicate this.
- It will be worth keeping the migration page with (some) content on it for users who do not migrate their workloads and continue to use login7 until they are no longer allowed, for atleast some time.
- In #138
[ ] When RHEL 7 information is no longer required, update all references to RHEL 7/8 splits.
- It may be worth waiting a while to do this, to allow users who do not migrate any time soon to be able to find the stale information?
- In #139
[ ] Remove the notification from the root / key pages
- In #139
[ ] Strip most of the RHEL 8 migration content, leaving a stub. Remove from the navigation.
- In #139

N8-CIR-Bede / documentation

RHEL8 upgrade #73