aws-samples / aws-parallelcluster-post-install-scripts

Scripts to customize AWS ParallelCluster
MIT No Attribution
23 stars 13 forks source link

Pyxis runtime path cannot be on /fsx #20

Open verdimrc opened 7 months ago

verdimrc commented 7 months ago

Pyxis runtime path cannot be /fsx, otherwise error to run Docker image (directly) on multiple nodes.

# NOTE: below works fine for -N1.
$ srun -N2 --container-image=alpine grep PRETTY /etc/os-release
...
slurmstepd: error: pyxis:     Can't find a SQUASHFS superblock on /fsx/pyxis/1000/385.0.squashfs
slurmstepd: error: pyxis:     Wrong filesystem or filesystem is corrupted!
slurmstepd: error: pyxis:     Failed to read existing filesystem - will not overwrite - ABORTING!
slurmstepd: error: pyxis:     To force Mksquashfs to write to this block device or file use -noappend
...
srun: error: p4de-st-p4de-1: task 0: Exited with exit code 1
...
slurmstepd: error: pyxis:     [ERROR] No such file or directory: /fsx/pyxis/1000/385.0.squashfs
...
srun: error: p4de-st-p4de-2: task 1: Exited with exit code 1
codeknight03 commented 2 months ago

Both the MRs mentioned above still don't fix this issue. No super block found errors still get raised when using this script. Are you guys open for an MR ?