Open dpmatthews opened 1 year ago
Thanks for raising this. I think a solution that isn't tied to user configuration but instead via the workflow itself would be ideal (assuming I have understood). Our cylc7 suite utilises optional config keys so run for different purposes. Some configurations would utilise the root-dir targets in different ways.
Note my post on the above discourse topic - we actually need workflow-specific symlink dirs in general, not just on remote hosts.
However, it is fundamentally uncool for the workflow itself to specify where it should be installed (or symlinked) to. The workflow is what gets installed; where to, should be determined externally somehow.
I think it makes sense to specify where individual workflows should go, if necessary, in the (user's) global config file.
...uncool for the workflow itself to specify where it should be installed (or symlinked) to...
I suspect you thought I meant wanting continued support of root-dir=*=<somedir>
? - which I do not.
I have perhaps not explained very well. Our usecase isn't for the suite to decide where it is itself installed. By "root-dir targets", I mean the 'root-dir' mechanism by which target directories were specified in cylc7.
In particular, I mean specifying work, share, share/cycle and log:
root-dir{share/cycle}=*=$SCRATCH
root-dir{share}=*=$SCRATCH
root-dir{work}=*=$SCRATCH
Sometimes (like in our suite), running the suite for say trials (trial opt key) would mean generating huge quantities of data. This isn't a characteristic of who runs it but of running 'trial' (opt key) for this particular workflow. That is, I think there is a place for the user global config file in this for sure, but I think that could be to override what is specified in the workflow itself (if defined), share/cycle, share, work, log etc.
That is, I think there is a place for the user global config file in this for sure, but I think that could be to override what is specified in the workflow itself (if defined), share/cycle, share, work, log etc.
By "in the workflow itself" I think you mean rose-suite.conf
with Cylc 7? From a Cylc perspective, rose suite-run
was "external" although the Rose config file was stored with the workflow source.
In Cylc 8, that functionality is handled by cylc install
which is configured via global.cylc
.
[UPDATE:] with the proviso that remote (job platform) symlinks are created at run time, not install time, when the remote gets iniitialized.
The biggest problem with this is where to define it.
We don't want to store this in the workflow configuration (flow.cylc or suite.rc) because this is an installation option so would require us to load the workflow configuration at install time which we wouldn't want to do.
Options:
install.cylc
/ cylc-install.toml
/ pyproject.toml
/ flow.py
.rose-suite.conf
file.Options 3-4 may require a new pre-install plugin type.
Is there a desire to also configure the remote installation symlink targets in this way or would this just be for local installation?
I have a similar use case to @cpelley. I have an hourly cycling workflow that makes large amounts of transient data (kept on disk for up to 24 hours).
Option 1 is an OK workaround during the cylc7 to cylc8 upgrade, but long term it runs the risk of using up the disk quota by accidentally omitting the command line option.
Option 3 or 4 would be my preference as it seems a lot more explicit than option 2. However it's done, it would ideally be something that can be overridden at different sites for portable workflows.
With regards to remote installation symlink targets, I don't have a case right now but I imagine any platform could have the same disk usage/quota issues.
Is there a desire to also configure the remote installation symlink targets in this way or would this just be for local installation?
This issue was specifically about remote installation but a solution covering localhost as well would be preferable.
Note, remote installation could be configured in the workflow config because it happens at runtime after the config has been processed (we already have the .cylcignore
file (a sidecar file) for configuring local installation, but the [scheduler][install]
section for configuring remote installation), however, it might make more sense to co-locate these
Option 3 or 4 would be my preference as it seems a lot more explicit than option 2. However it's done, it would ideally be something that can be overridden at different sites for portable workflows.
I think option 2 (by user global config) is the right way to do it.
Finally (3 and 4) keeping installation configuration in the workflow source directory (even in a special file) is fundamentally kinda wrong
it could be done centrally (site global config) for workflows that adhere to name/path conventions (but that isn't enough)
this is very similar, in principle, to platforms config, but finer-grained. And that's global config.
I realise how rose suites are being version controlled in the future is up for discussion, but under the current working practices where workflows are continually copied and renamed I can see this being problematic. Eventually someone will choose a name that will break the conventions.
Something equivalent to 'platforms', where the workflow selects from a set of centrally configured options, sounds like a good way to go.
. Eventually someone will choose a name that will break the conventions.
I agree, but that's what the user global config is for - it allows you to define your own conventions (even down to individual workflows) if you want. I was just pointing out that central config is possible, if users need or want to conform to whatever conventions that imposes (but that doesn't preclude use of user global config as well)
I realise how rose suites are being version controlled in the future is up for discussion,
Note that Cylc is also used at sites that don't have Rose.
However, we are retaining rose-suite.conf
support in Cylc via a plugin, so we could potentially allow additional installation config in that as well, since technically that file already amounts to keeping installation config in the source directory (which I'm arguing we should move away from, more generally).
Something equivalent to 'platforms', where the workflow selects from a set of centrally configured options, sounds like a good way to go.
This wouldn't be portable between sites.
keeping installation configuration in the workflow source directory (even in a special file) is fundamentally kinda wrong
Agreed! Configuring installation options in the workflow is just wrong which is why it was purposefully dropped from Cylc 8 (this was not an accidental omission)! This is a user-specific installation option used to work around site-configured filesystem limits, it is not a property of a workflow and is not portable between sites. Even at one site one user might not need or want to install in the same way as another.
Because this is working around a user's filesystem allocation we considered it a user configuration problem. I.E. if you don't have enough allocation to run workflows, then configure symlink dirs [for all of your workflows] to a filesystem where you do have enough space.
Eventually someone will choose a name that will break the conventions.
I realise how rose suites are being version controlled in the future is up for discussion
The option (2) mentioned above isn't necessarily related to Rosie, version control or even workflow names, but the way that users manage their working copies. E.G. we could choose to work like this:
~/cylc-src/
project-a/
workflow-1/
.svn
workflow-2/
.git
project-b/
workflow-3/
.git
Or even like this:
# ~/.cylc/flow/global.cylc
[intstall]
sources = ~/project-a, ~/project-b, ~/roses, ~/cylc-src
~/project-a
workflow-1/
.svn
workflow-2/
.git
~/project-b/
workflow-3/
.git
So option (2) could look something like this:
[install]
[[sources]]
[[[~/project-a]]]
[[[[symlink dirs]]]] # override site-defaults just for this project
work = /big/volume
Conceptually I see the argument for wanting to keep the workflow installation settings separate, but as someone who develops/maintains workflows for other users I think it will just cause problems.
Really I want my instructions to my users to be.
If I know there's a problem with running the workflow using the default configuration at my site I want to be able to handle that for the user automatically. The more instructions I have to give about things that need configuring outside the suite or about following conventions, the more opportunities there are for something to go wrong.
cylc/cylc-rose#237 proposes to allow environment variables defined in rose-suite.conf
to influence the global config. This should provide a solution to workflow specific symlink dirs for some users.
(The plan is to also support an alternative solution based on workflow name)
A bit of a recap.
Primarily:
rose-suite.conf
visible to Cylc global config (see previous comment from @dpmatthews). Symlinking can then be controlled by Jinja2 code in the site global config file, according to user-set variables.This will probably be sufficient for existing sites that used rose suite-run
(and run dir symlinks) with Cylc 7.
However, for completeness, we may want to support or document other solutions as well, because:
rose-suite.conf
for this will seem very weird to those who don't otherwise use Rose Other ideas:
symlink.cylc
that doesn't come with other baggage (and then, again, Jinja2 in global config).global.cylc
]Note 2. and 3. the naming convention could be based on source workflow name or parent directory name, or install-dir name.
My first cut at 2.:
from cylc.flow.scripts.install import get_option_parser as install_opt_parser
from cylc.flow.scheduler_cli import get_option_parser as play_opt_parser
import sys
def get_workflow_name():
"""Parse a command line like 'cylc install' or 'cylc play'."""
if sys.argv[1] == 'install':
opts, args = install_opt_parser().parse_args(sys.argv[2:])
return opts.workflow_name or args[0]
if sys.argv[1] == 'play':
opts, args = play_opt_parser().parse_args(sys.argv[2:])
return args[0]
else:
return "dunno"
and in global.cylc
:
#!Jinja2
{% from "get_workflow_name_cli" import get_workflow_name %}
{% set NAME = get_workflow_name() %}
[install]
[[symlink dirs]]
[[[localhost]]]
{% if NAME.startswith("proj_a") %}
run = /tmp/ProjectA/$USER
{% elif NAME.startswith("proj_b") %}
run = /tmp/ProjectB/$USER
{% else %}
run = /tmp/$USER/
{% endif %}
And for 3. (via @oliver-sanders ):
# setup.cfg
[entry points]
cylc.flow.pre_install = main:main
# main.py
import os
def main(path, ...):
if path.relative_to('~/cylc-run'):
return
os.environ['CYLC_PROJECT'] = path.parent
# global.cylc
{% from "os" import environ %}
{% if os.environ.get('CYLC_PROJECT') == 'foo' %}
# ...
{% endif %}
NB:
the installation plugins can access the workflow ID derived from the --workflow-name.
What if my projects workflows can use one of multiple disks depending on my mood? We would arbitrarly have to add some specifier which otherwise isn't important? This is an actual case where one project I work on has space on two different lustre disks, and I may use one or the other depending on how much space is available on each (or how heavily loaded that disk is going to be when I'm running an experiment).
Could you make a system wide configuration which can specify variables which if set, will be passed through on setup? Then systems admins can say we will accept variables "A, B, C" and those will be used to determine other items in the global.cylc jinja2?
What if my projects workflows can use one of multiple disks depending on my mood?
The rose-suite.conf
(or equivalent symlink.cylc
) solution handles that, because it is workflow specific.
And so do the other solutions, because users can add to or override site config with their own global.cylc
and so use their own naming conventions (right down to invdividual workflows if necessary) to determine symlinking.
I have a similar issue, we have a workflow, at cylc7 the directories were sym linked in the optional rose configs, because different options require different data retentions and volumes and different "teams" have different requirements.
How would this be handled via a global.cylc
? I can see the global.cylc
getting very complicated and we would need to "pass" these around with our workflows when we provide them to the production teams.
I understand the desire to get rid of the information from the rose configs but currently do not see a solution that will not result in productions teams and even my team having a complicated global.cylc
and will include the need for -O
specific settings.
I may have misread Hilary's post above in the "cut for " as it looks like it could be there (I need to go read the underling code) but I am not sure how it is then applied in the global.cylc
.
Can't we use a pre-configure plugin to affect global.cylc? Or does this break with remote hosts. E.g.
def pre_configure(srcdir: Path=None, opts: optparse.Values=None, rundir: Path=None) -> T.Dict:
"""
Reads file srcdir/configvars.toml and adds its contents to the Jinja2 environment
for both suite and global configurations under namespace 'configvars'
"""
with open(srcdir / 'configvars.toml', 'rb') as f:
config = tomllib.load(f)
return {
'template_variables': {'configvars': config},
'templating_detected': 'jinja2'
}
with global.cylc
as
#!jinja2
[install]
[[symlink dirs]]
[[[localhost]]]
share = /scratch/$PROJECT/$USER/{{configvars.foo}}
and configvars.toml
as
foo = "bar"
It doesn't look like cylc-rose is using the template_vars value in its pre-configure plugin at the moment
How would this be handled via a
global.cylc
? I can see theglobal.cylc
getting very complicated and we would need to "pass" these around with our workflows when we provide them to the production teams.
The aim would be to agree a set of options (configurable via environment variables) that meet the needs of users at our site. These would be configured centrally.
Can't we use a pre-configure plugin to affect global.cylc? Or does this break with remote hosts.
Actually this doesn't work - it requires the config file to be in the same directory as global.cylc
, as the plugin only knows the path to the file currently being parsed.
Another idea would be to add custom environment variables to the list sent in SSH commands, like CYLC_VERSION currently is:
[platforms]
[[localhost]]
ssh forward environment variables = PROJECT, LUSTRE_DISK
resulting in commands like
ssh -oBatchMode=yes -oConnectTimeout=10 -n gadi-login-01 \
env CYLC_VERSION=8.3.0.dev PROJECT=dp9 LUSTRE_DISK="/g/data1" bash --login -c \
'exec "$0" "$@"' /scratch/hc46/saw562/conda-dev/bin/cylc \
play --debug test-global/run2 --host=localhost
The server then has the variables available when setting up symlinks.
This seems like a minor change - just adding a new config option to be expanded in construct_ssh_command()
Another idea would be to add custom environment variables to the list sent in SSH commands, like CYLC_VERSION currently is:
Yeah I was considering this myself recently for other reasons. It could be a good idea.
https://github.com/cylc/cylc-rose/issues/237 proposes to allow environment variables defined in rose-suite.conf to influence the global config. This should provide a solution to workflow specific symlink dirs for some users.
-- https://github.com/cylc/cylc-flow/issues/5418#issuecomment-1639860905
This functionality was released with cylc-rose version 1.4.0. You can use the [env]
section in the rose-suite.conf
file to set environment variables which will be made available to the global.cylc
file when loaded.
See https://github.com/cylc/cylc-rose/issues/237 for more details.
At present, if you want different workflows to use different symlink setups on a remote platform the only way to achieve this is to create separate platforms which refer to different install targets. Some users would like the ability to configure the symlinks on a per workflow basis. This used to be possible with Cylc 7 via the Rose "root-dir" setting.