N8-CIR-Bede / documentation

Documentation for the N8CIR Bede Tier 2 HPC faciltiy
https://bede-documentation.readthedocs.io/en/latest/
7 stars 11 forks source link

Grace-Hopper documentation #185

Closed ptheywood closed 5 months ago

ptheywood commented 7 months ago

Initial Grace-hopper documentation has been added by #183.

As the software availability is different on the grace-hopper (aarch64) nodes than Power (ppc64le) nodes, the documentation needs expanding to cover this.

This could be done by:

  1. Use the existing software pages, clarifying which arch it is available for, with per arch instructions where required (i.e. pytorch via container not conda).
  2. Split the software section into 2 per arch sections (with redirects from the current URI to the ppc64uri, and cross-links from each common page).

The tradeoff is duplication vs how users are likely to find the docs.

I.e. are users going to look for software, or look for arch then software.

Notable inclusions / changes needed (include content added in #183 to the usage pilot section):

ptheywood commented 7 months ago

modules available as of 2024-02-28

module avail --long 
- Package -----------------------------+- Versions -+- Last mod. ------
/opt/software/builder/modules/infrastructure:
slurm/dflt                               default     2021/06/14  9:55:05
tools/1.2                                            2021/06/23 13:53:17
user                                                 2021/06/23 14:27:09

/opt/software/builder/modules/developers/tools:
nsight-compute/2023.3                    default     2024/02/01 16:10:08
nsight-systems/2023.4.1                  default     2024/02/01 16:18:13

/opt/software/builder/modules/developers/compilers:
cuda/11.7.0                                          2024/01/30 13:03:28
cuda/11.7.1                                          2024/01/30 13:03:28
cuda/11.8.0                                          2024/02/01 15:50:12
cuda/12.1.1                                          2024/02/01 15:50:12
cuda/12.2.2                                          2024/02/01 15:50:12
cuda/12.3.2                              default     2024/01/19 16:38:11
gcc/12.2                                 default     2024/02/01 13:47:21
gcc/13.2                                             2024/01/30 16:11:37
gcc/native                                           2024/01/25 16:01:48
nvhpc/24.1                               default     2024/02/05 12:02:19

Edit: In the process of being installed as of 2024-03-07 (via email)

- openmpi/4.1.6
- openblas/0.3.26
- fftw/3.3.10
- hdf5
- netcdf (maybe?)

As of 2024-03-11

/opt/software/builder/modules/developers/libraries:
boost/1.84.0                             default     2024/03/07 16:13:27
fftw/3.3.10                              default     2024/03/08  9:34:26
hdf5/1.10.11                             default     2024/03/08 10:40:13
openblas/0.3.26                          default     2024/03/07 14:44:14
openblas/0.3.26omp                                   2024/03/07 15:39:20
openmpi/4.1.6                            default     2024/03/07 16:35:31
ptheywood commented 7 months ago

I've made a start testing 2 separate ways we could present the architecture / partition specific software differences, both very much rough work currently just to guage which feels better to use / how much effort each approach would require:

tabs (gh-tabs branch)

using the sphinx-tabs extension, code blocks etc can be made per architecture, which works well (but needs some CSS tweaks)

image image

This does not work as well for longer-form text differences in my opinion, although it can be used.

For software packages which are ppc64le only, or aarch64 only, we would need to mark them as such. note blocks are an option for this.

image

Warning/info/some other block could be used instead. Currenrlty most software pages would require one of these, so it might be nice to find a way of automating this via sphinx variables and an include with replacements (should be doable).

split directories (`gh-sections

Splitting the software section into aarchitecture specific subdirectories is also an option.

Software available on both architecture would need pages duplicating and modifying as appropraite. This duplication can be reduced via includes and replacements, to reduce maintenance burden.

image

image

Cross-references to the other arch might be nice, and might be possibel to (semi?) automate.

This option works better for software that is arch specific (i.e. IBM XML, apptainer/singularity currently), but some users might end up on the page for the wrong architecture (e.g. via a search engine or bookmarked link).

Tabs could still be useful in places to shorten content (i.e. usage).

Labels for cross referncing are one downside, as they manually need modifying, rather than being relative within the software directory.


Currently, given the significant difference in available modules on each arch, i'm leaning towards the split software sections approach, if the duplication of content can be reduced enough.

Pre-built demos

ptheywood commented 7 months ago

If we go for the single page per software with tabs approach, we could add an admonition to the top of each page clarifying which architecture it is available for, and as the sphinx-book-theme is based on the pydata-sphinx-theme, this includes sidebar admonitions by default.

E.g. something along the lines of (content is just for example purposes, not the true state of things):

image

.. admonition:: Partition availability
    :class: sidebar note

    .. list-table:: 

      * - Architecture 
        - Available
      * - ppc64le
        - yes
      * - aarch64 
        - yes

.. admonition:: <architecture> only
    :class: sidebar warning

    This software is only available on <architecture>

This might require #186, I've not stested this in the older version of the theme currently in use.

The contents of the admonition could be standardised, stored ina n include, using replacements, however there's no ability to conditionally include sections of test from a per page variable (only: and ifconfig are project wide).

E.g.

.. |available_ppc64le| replace:: Yes
.. |available_aarch64| replace:: No
.. include:: /common/software-admonitions.rst
.. admonition:: Partition availability
    :class: sidebar note

    :ppc64le: |available_ppc64le|
    :aarch64: |available_aarch64|

image

This is not as useful as I'd hoped given the lack of per page conditional logic. Could potentially write a sphinx extension, but that feels like overkill.