Closed Dooruk closed 1 month ago
I found references to swell_static_files
in the following tasks:
GenerateBClimatology
(but it seems to be unused?)GenerateBClimatologyByLinking
(already configured)GetGeosRestart
-- copy files from swell_static_files
directory to target directoryPrepGeosRunDir
-- copy files from swell_static_files
directory to target directoryStageJedi
-- Replace corresponding key in Jedi templateFrom what I can tell, in GenerateBClimatologyByLinking
, the current behavior is that both swell_static_files_main
and swell_static_files_user
are searched, with preference given to swell_static_files_user
and then, any additional files not present in the user directory are linked from swell_static_files_main
.
Do we want similar behavior for the other tasks as well? I.e., We effectively pull in every file from swell_static_files_main
and then treat any files present in swell_static_files_user
effectively as optional overrides?
One key issue with this behavior is that there is no way to force swell to not copy over files from swell_static_files_main
that it otherwise would. For example, if your swell_static_files_user
contains everything in swell_static_files_main
except a few files that you want excluded, those files will be included anyway --- there's no way to specify that certain files need to be skipped. In practice, that may not be a problem, but I wanted to flag it and give you a chance to weigh in.
Note that an insidious version of this is if you have the same files but the naming and/or directory structure of the files changes between swell_static_files
and swell_static_files_user
. Then, you may end up with multiple files/directories that, in practice, contain duplicates of the same data.
Personally, my vote would be to just define a task-specific static_files
parameter with some default value that could be overridden and then force the user to provide a complete set of required static files even if it means copying. That's more work on the user in exchange for clarity of behavior --- you always know exactly where any given swell environment got its files. Whereas, under the current rules, a user may be surprised to see "extra" static files that were pulled in from the global location. I.e., This would mean changing the behavior of GenerateBClimatologyByLinking
to something like:
swell_static_files = self.config.swell_static_files()
swell_static_files_user = self.config.swell_static_files_user(None)
if swell_static_files_user is not None:
swell_static_files = swell_static_files_user
# [...]
source_paths = [os.path.join(swell_static_files, ...)]
link_all_files...(self.logger, source_paths, target_path)
Alternatively, maybe the point is that, if you want to provide a complete replacement, you just set the swell_static_files
parameter (as an override?) and swell_static_files_user
is specifically for this "file merging" behavior.
Let me know what you think.
Personally, my vote would be to just define a task-specific
static_files
parameter with some default value that could be overridden and then force the user to provide a complete set of required static files even if it means copying. That's more work on the user in exchange for clarity of behavior --- you always know exactly where any given swell environment got its files.
I thought about this and read your thoughts. I think users should choose only one of these options (user or global). As you mentioned, combining sources make it very confusing and hard to follow which files/configurations are being used. Also, in a way, users would get familiar with how to use SWELL and where files are coming from (given that we should provide documentation on that 😄). I personally learn the most when the code fails!
Alternatively, maybe the point is that, if you want to provide a complete replacement, you just set the
swell_static_files
parameter (as an override?) andswell_static_files_user
is specifically for this "file merging" behavior.
I think user still should have full control of their local static folder and make their changes there. That's how Ricardo and I have been operating.
This is a separate concern but during the SWELL development meeting there was a brief discussion on version control with the swell_static_files
. Sounds like there is no straightforward solution to that since it requires handling many different large file formats. My suggestion for now is that we can name the static files as the SWELL version and keep the previous static folders up to 4-5 most recent versions.
I thought about this and read your thoughts. I think users should choose only one of these options (user or global). As you mentioned, combining sources make it very confusing and hard to follow which files/configurations are being used. Also, in a way, users would get familiar with how to use SWELL and where files are coming from (given that we should provide documentation on that 😄). I personally learn the most when the code fails!
Got it. In that case, I think the correct implementation is not to have a separate swell_static_files_user
, but rather to make swell_static_files
an override-able configuration option (with an associated questionary question) that has the current swell_static_files
value as the default. That will be a minor breaking change to GenerateBClimatologyByLinking
, but shouldn't affect any of the other tasks.
This is a separate concern but during the SWELL development meeting there was a brief discussion on version control with the swell_static_files. Sounds like there is no straightforward solution to that since it requires handling many different large file formats. My suggestion for now is that we can name the static files as the SWELL version and keep the previous static folders up to 4-5 most recent versions.
Agreed that we should figure this out, and also that it should be a separate issue. I'll open that now.
@Dooruk Sorry, one more question: Is there a use case where users would use swell_static_files_user
for some tasks but swell_static_files
for other tasks? Or should users really be using the same swell_static_files
for all tasks within a single swell run?
If the latter, this becomes trivial because it already works --- we just drop swell_static_files_user
altogether and just make people use the swell_static_files
. If you need to change the static files, you just set the swell_static_files
override.
If there are situations where, within a single swell run, you might want to use swell_static_files
for some tasks but swell_static_files_user
for other tasks, then we really do need a swell_static_files_user
. That's also not hard to do.
The more I think about it the more indecisive I get, but let me share my thoughts.
Short answer, yes an option to pick which tasks one would like to use their own folder would be useful, but I just worry this complicates usage even further. R2D2 does some of what you are proposing elegantly with the fetch order, but does not specify tasks.
In the case of GenerateBClimatology
and StageJedi
, one might be using SWELL in coupled mode and might want geos_atmosphere
files from the shared static folder while using geos_marine
files from their local folder.
In the case of GetGeosRestart
and PrepGeosRunDir
users should be able to point to their static folder with GEOS experiment settings and restarts. Actually, that should allow for external folders too not just the static folder, perhaps I can rewrite that task at some point.
There would be more tasks using this in the future.
Understood, thanks! For now, I'll implement swell_static_files_user
for all tasks that currently support swell_static_files
, with this logic (pseudocode):
swell_static_files = Config.swell_static_files()
swell_static_files_user = Config.swell_static_files_user()
if swell_static_files_user:
swell_static_files = swell_static_files_user
do_stuff(swell_static_files)
I'm not thrilled about this design for multiple reasons, but pending large scale overhaul of how we do configs (per #314), this is probably the easiest thing to do.
Currently,
swell_static_files_user
is only used ingenerate_b_climatology_by_linking
task. For others tasks and CI tests, the generic location for static files is/discover/nobackup/projects/gmao/advda/SwellStaticFiles
. However, we could give users ability to useswell_static_files_user
for their own static files as an alternative to the generic location.