aws-samples / aws-parallelcluster-post-install-scripts

Scripts to customize AWS ParallelCluster
MIT No Attribution
16 stars 9 forks source link

Pyxis #1

Closed lipovsek-aws closed 1 year ago

lipovsek-aws commented 1 year ago

Added enroot and pyxis, noting here few next steps:

  1. figure out a clever way to reference correct enroot.conf version (script name in parallel cluster is not consistent with the one here)- I'm thinking more about arguments (or figuring out from predefined variables, but pcluster downloads this as a tmpfile, would about brute forcing it with envsubst). It's not blocker for now. the only good idea I have so far is CI that publishes this artifact for branches and releases.
  2. move to envsubst for more flexible templating
  3. move away from positional arguments to something like getopts to enable more parameters (custom enroot.conf template, running other necessary scripts (for example custom enroot.conf might need some filesystem oprations),...)
  4. move README instructions away from referencing main to releases
  5. consider "abstracting away" enroot.conf - provide higher level arguments and automate all templating and other operations (such as creating directory and setting correct ownership)
lipovsek-aws commented 1 year ago

We have to manually change enroot.conf to reference main, not pyxis branch when merging. Leaving it to pyxis for now for testing.

lipovsek-aws commented 1 year ago

@sean-smith testing done, review and let me know if you have any considerations. I'll merge when you're done.

wdykas commented 1 year ago

sbatch scripts don't work with --container-image sruns for me. The symlink on /opt/slurm/etc/plugstack.conf.d.pyxis.conf is broken on compute nodes.

lipovsek-aws commented 1 year ago

sbatch scripts don't work with --container-image sruns for me. The symlink on /opt/slurm/etc/plugstack.conf.d.pyxis.conf is broken on compute nodes.

@wdykas here are my examples

[ec2-user@ip-182-168-1-190 ~]$ srun grep PRETTY /etc/os-release
PRETTY_NAME="Amazon Linux 2"
[ec2-user@ip-182-168-1-190 ~]$ srun --container-image=alpine grep PRETTY /etc/os-release
pyxis: importing docker image: alpine
pyxis: imported docker image: alpine
PRETTY_NAME="Alpine Linux v3.17"
[ec2-user@ip-182-168-1-190 ~]$ sbatch --container-image=alpine --wrap "grep PRETTY /etc/os-release"
Submitted batch job 3
[ec2-user@ip-182-168-1-190 ~]$ cat slurm-3.out 
pyxis: imported docker image: alpine
PRETTY_NAME="Alpine Linux v3.17"

I have installed docker with enroot+pyxis before without any issues, but never after.