chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.8k stars 421 forks source link

Requesting launcher flag for CHPL_LAUNCHER_ACCOUNT #22798

Open mstrout opened 1 year ago

mstrout commented 1 year ago

I am running some tutorial example code on Frontier. :)

I was getting an srun error indicating I needed to specify an account with -A. When I ran the executable with --help, I got the following:

./hello6-taskpar-dist -nl 2 -v --help
LAUNCHER FLAGS:
===============
  --generate-sbatch-script  : generate an sbatch script and exit
  --walltime <HH:MM:SS>     : specify a wallclock time limit
                              (or use $CHPL_LAUNCHER_WALLTIME)
  --nodelist <nodelist>     : specify a nodelist to use
                              (or use $CHPL_LAUNCHER_NODELIST)
  --partition <partition>   : specify a partition to use
                              (or use $CHPL_LAUNCHER_PARTITION)
  --exclude <nodes>         : specify node(s) to exclude
                              (or use $CHPL_LAUNCHER_EXCLUDE)
  --dry-run                 : just print system launcher command, don't run it
...

Upon looking through the documentation I found documentation about the launcher at https://chapel-lang.org/docs/usingchapel/launcher.html#readme-launcher. That is where I found the CHPL_LAUNCHER_ACCOUNT environment variable. I propose we create a launcher flag for this so that this environment variable is exposed with --help. @ronawho or @e-kayrakli or anyone else, what are your thoughts on this?

kwaters4 commented 1 year ago

There are quite a few extra options that are not currently not captured by the CHPL_LAUNCHER for slurm or other job schedulers. The difficulty is that they can be site specific. Some examples I think think of are accounts (as stated in the issues), queues name, memory, time of job.

It can be tricky making them environment variables if your environments are saved in modules that are shared across the system. Most of these options are put into a submission script that is configured by the user one the host platform. There is a convenience using the chapel launcher but it can quickly become obsolete. I would be interested in hearing what other people experiences are on shared resources.

mppf commented 1 year ago

One of the things I have been hoping for here is that we improve our Chapel Launchers implementation to use Python and to be simple enough to write that site system administrators and/or developers at particular sites can create custom ones. IMO one of the big challenges here is that it's hard to test the various elements of these launchers (since, well, they can be site-specific); so I would expect such an effort will take some time in order to shake out problems when changing how it works.

See also #11818.

e-kayrakli commented 1 year ago

While I am not opposed to the idea, adding baked-in arguments to the generated executable can take away names from user's config variables, or could be a source of confusion. We can choose --launcher-partition as the flag that can't be a valid variable name. But then you could have --launcherPartition as a user flag. I admit that it is a bit contrived as the name shouldn't occur in user code that often. But I am wary of trying to guess what a user could or couldn't do.

My worries are compounded with the fact that if we add a flag for it, we should definitely add a flag for CHPL_LAUNCHER_PARTITION as that's something I use way more often. There may be other environment variables that meet the same bar for being added as a flag, too.

As an alternative to baked-in arguments (not that you're particular about it), we can try to do something with private config (potentially requiring some fixes, see https://github.com/chapel-lang/chapel/issues/22297). As a sketch, imagine having a private config const launcherPartition in ChplConfig or some standard module. We could set that at execution time with --ChplConfig.launcherPartition=, which is more appealing to me than adding a baked-in flag. I generally would like us to move some of our internal knobs to use this strategy, as well.

bradcray commented 1 year ago

The following issue is (vaguely) related to this one: https://github.com/chapel-lang/chapel/issues/20777

One other thing that I think we've discussed at times is a general launcher "pass-through" flag that would permit additional flags / arguments to be passed to slurm/pbs/whatever if we don't happen to have an environment variable or flag defined for that behavior, such that one could do, say, --native-launcher-flags="-A MyAccountName" or --slurm-flags="-A MyAccountName" and have that add -A MyAccountName to the srun command. Maybe there's a reason this doesn't make sense, but if so, I can never seem to remember for it so keep mentioning it.