E4S-Project / e4s

E4S for Spack
https://e4s.readthedocs.io
MIT License
28 stars 13 forks source link

[docs]: incorrect documentation for Paratools E4S deployment on Perlmutter #88

Closed shahzebsiddiqui closed 2 years ago

shahzebsiddiqui commented 2 years ago

Please describe the issue

This page needs to be updated https://e4s.readthedocs.io/en/latest/deployment.html#perlmutter

Shown below are the available modules since we hid a few mvapich2 modules

 ~/ module use /global/cfs/cdirs/m3896/shared/modulefiles
 ~/ module av

------------------------------------------ /global/cfs/cdirs/m3896/shared/modulefiles ------------------------------------------
   mvapich2/3.0a

Also i noticed that e4s/22.05/mvapich2 module along with e4s/22.05/PrgEnv-gnu is not accessible to everyone. We should remove this from the documentation or fix the permissions.

 ~/ ls -l /global/cfs/cdirs/m3896/shared/modulefiles/e4s/*
/global/cfs/cdirs/m3896/shared/modulefiles/e4s/22.05:
ls: cannot access '/global/cfs/cdirs/m3896/shared/modulefiles/e4s/22.05/mvapich2-3.0a.lua': Permission denied
ls: cannot access '/global/cfs/cdirs/m3896/shared/modulefiles/e4s/22.05/PrgEnv-gnu.lua': Permission denied
total 0
-????????? ? ? ? ?            ? mvapich2-3.0a.lua
-????????? ? ? ? ?            ? PrgEnv-gnu.lua

/global/cfs/cdirs/m3896/shared/modulefiles/e4s/mvapich2:
total 1
-rw-rw---- 1 lpeyrala m3896 3329 Jun 17 10:12 22.05.lua.bak

/global/cfs/cdirs/m3896/shared/modulefiles/e4s/PrgEnv-gnu:
total 1
-rw-rw---- 1 lpeyrala m3896 3045 Jun 17 10:12 22.05.lua.bak

I am pretty sure @eugeneswalker may have a umask 007 that is causing this issue. Perhaps you can change it to umask 002 to address this problem. I think you want g+w to write to the directory via m3896 but still have o+rx permission

eugeneswalker commented 2 years ago

I am looking at this now. Will resolve the issue and post update here.

eugeneswalker commented 2 years ago

@shahzebsiddiqui

Can you please verify that you were using these commands on Perlmutter and NOT on Cori?

I can reproduce what you report if I am trying these commands on Cori. But these modules are not for Cori.

This is what happens on Perlmutter:

$perlmutter> module use /global/cfs/cdirs/m3896/shared/modulefiles
$perlmutter> module avail e4s

------------------- /global/cfs/cdirs/m3896/shared/modulefiles ----------------------
   e4s/22.05/mvapich2-3.0a    e4s/22.05/PrgEnv-gnu

Furthermore, the permissions look OK:

$perlmutter> ls -ld /global/cfs/cdirs/m3896/shared/modulefiles
drwxrwsr-x 4 sameer m3896 4096 Aug 18 07:37 /global/cfs/cdirs/m3896/shared/modulefiles

$perlmutter> ls -ld /global/cfs/cdirs/m3896/shared/modulefiles/e4s
drwxrwsr-x 4 sameer m3896 4096 Aug 18 07:31 /global/cfs/cdirs/m3896/shared/modulefiles/e4s

$perlmutter> ls -ld /global/cfs/cdirs/m3896/shared/modulefiles/e4s/22.05
drwxrwsr-x 3 lpeyrala m3896 16384 Aug 18 07:38 /global/cfs/cdirs/m3896/shared/modulefiles/e4s/22.05

$perlmutter> ls -l /global/cfs/cdirs/m3896/shared/modulefiles/e4s/22.05
total 1
-rw-rw-r-- 1 sameer m3896 1602 Aug 12 11:07 mvapich2-3.0a.lua
-rw-rw-r-- 1 sameer m3896 1104 Jun 17 10:35 PrgEnv-gnu.lua
shahzebsiddiqui commented 2 years ago

That's because it using your user account and you are part of the Unix group while I am not. So please make sure the permissions for all directories and files are world readable

eugeneswalker commented 2 years ago

Here I am trying as the e4s user which is not part of the group that owns our module directory.

e4s:login34> groups
e4s m3503 spackecp

e4s:login34> module use /global/cfs/cdirs/m3896/shared/modulefiles

e4s:login34> module avail e4s

------------- /global/cfs/cdirs/m3896/shared/modulefiles -----------------
   e4s/22.05/mvapich2-3.0a    e4s/22.05/PrgEnv-gnu (D)

Can you confirm you are able to see the modules now from Perlmutter, not Cori?

eugeneswalker commented 2 years ago

We have received confirmation from multiple other users, not in our group, that they are able to see the module files.

Closing this as resolved.

shahzebsiddiqui commented 2 years ago

Reopening this issue. It's not solved yet. The scope of this is to fix the documentation.

The documentation has the following

Screen Shot 2022-08-19 at 2 26 18 PM

However we have the following

 ~/ module use /global/cfs/cdirs/m3896/shared/modulefiles
 ~/ module av

----------------------------------------------------------------------------------- /global/cfs/cdirs/m3896/shared/modulefiles -----------------------------------------------------------------------------------
   e4s/22.05/mvapich2-3.0a    e4s/22.05/PrgEnv-gnu (D)    mvapich2/3.0a

There is no module load e4s/22.05/mvapich2 but we have module load e4s/22.05/mvapich2-3.0a which points to a different stack most likely

Screen Shot 2022-08-19 at 2 27 08 PM

I think you should have an updated output of module av considering there may be difference in the modules generated along with the full path

 ~/ module load e4s/22.05/mvapich2-3.0a

Lmod is automatically replacing "cray-mpich/8.1.17" with "mvapich2/3.0a".

 ~/ module av

---------------------------------- /global/cfs/cdirs/m3896/shared/ParaTools/E4S/22.05/mvapich2-3.0a-slurm/spack/share/spack/lmod/cray-sles15-x86_64/mvapich2/3.0a-es35auw/Core -----------------------------------
   adios/1.13.1                darshan-runtime/3.3.1          hpx/1.7.1                    (D)    omega-h/9.34.1              py-petsc4py/3.17.1                    sundials/6.2.0
   adios2/2.8.0-cuda80         datatransferkit/3.1-rc3        hypre/2.24.0                        openpmd-api/0.14.4          py-warpx/22.05-dims2                  tasmanian/7.7-openmp
   adios2/2.8.0         (D)    dyninst/12.1.0-openmp          kokkos-kernels/3.6.00-cuda80        papyrus/1.0.2               py-warpx/22.05-dims3                  tau/2.31.1-cuda
   amrex/22.05                 faodel/1.2108.1                lammps/20220107-openmp              parsec/3.0.2012             py-warpx/22.05-dimsRZ          (D)    trilinos/13.0.1
   arborx/1.2                  fortrilinos/2.0.0              libquo/1.3.1                        petsc/3.17.1-cuda80         scr/3.0rc2                            veloc/1.5
   axom/0.6.1-openmp           globalarrays/5.8               mercury/2.1.0                       petsc/3.17.1         (D)    slate/2021.05.02-cuda80-openmp
   butterflypack/2.1.1         hdf5/1.10.7                    metall/0.20                         precice/2.4.0               slate/2021.05.02-openmp        (D)
   cabana/0.4.0                heffte/2.2.0-cuda80            mfem/4.4.0                          pumi/2.2.7                  slepc/3.17.1-cuda80
   caliper/2.7.0-cuda80        heffte/2.2.0            (D)    nccmp/1.9.0.1                       py-cinemasci/1.7.0          slepc/3.17.1                   (D)
   caliper/2.7.0        (D)    hpx/1.7.1-cuda80               nco/5.0.1                           py-libensemble/0.9.1        strumpack/6.3.1-openmp

---------------------------------- /global/cfs/cdirs/m3896/shared/ParaTools/E4S/22.05/mvapich2-3.0a-slurm/spack/share/spack/lmod/cray-sles15-x86_64/openmpi/4.1.3-gw3a4bv/Core -----------------------------------
   gptune/3.0.0

--------------------------------------------- /global/cfs/cdirs/m3896/shared/ParaTools/E4S/22.05/mvapich2-3.0a-slurm/spack/share/spack/lmod/cray-sles15-x86_64/Core ----------------------------------------------
   aml/0.1.0       charliecloud/0.26        flux-core/0.38.0           (D)    gotcha/1.0.3                        magma/2.6.2-cuda80           papi/6.0.0.1-cuda      raja/0.14.0-cuda80-openmp
   archer/2.0.0    cmake/3.23.1             gasnet/2022.3.0                   kokkos-kernels/3.6.00-openmp (D)    mpark-variant/1.4.0          pdt/3.25.1             superlu/5.3.0
   argobots/1.1    darshan-util/3.3.1       ginkgo/1.4.0-cuda80-openmp        kokkos/3.6.00-openmp                mvapich2/3.0a       (L,D)    plasma/21.8.29         swig/4.0.2-fortran
   bolt/2.0        flit/2.1.0               ginkgo/1.4.0-openmp        (D)    legion/21.03.0-cuda80-cuda          nrm/0.1.0                    py-jupyterhub/1.4.1    umap/2.1.0
   chai/2.4.0      flux-core/0.38.0-cuda    gmp/6.2.1                         legion/21.03.0               (D)    nvhpc/22.3                   qthreads/1.16          zfp/0.5.5-cuda80
eugeneswalker commented 2 years ago

Ahh, I see. Please see this PR and if you approve, let us merge it.