Closed samsrabin closed 6 months ago
Reproduced on cime master. I asked USG about the error message:
I am sporadically getting an error message when running bash scripts on derecho. The message doesn't seem to have any ill effects but is causing a test that looks for the keyword error in my output to fail - I think it is coming from the module load command but I'm having trouble figuring out what triggers it. The message is: /bin/bash: module: line 1: syntax error: unexpected end of file /bin/bash: error importing function definition for `module' 1 reply
Brian Vanderwende:
Yeah, this is a known bug we're working with PBS in which sometimes jobs with -V don't properly forward bash shell function definitions. Unfortunately there isn't any workaround aside from (a) not using -V or (b) manually redefining the module function definition at the start of your job. Using shell init flags to change behavior is irrelevant because PBS imposes the (broken) imported shell definitions after shell init.
Thanks for looking into this, Jim. It's probably overkill, but could da_no_data_mod.sh
be rewritten in Python to avoid this issue?
If you would like to try to rewrite that script in python, you are welcome to. I just tried removing the -V flag in the pbs and that seems to work. https://github.com/jedwards4b/ccs_config_cesm/tree/pbs_V_removed but it may have other side-effects, will need more testing.
Haha, thought you might say that! I say let's hope removing the -V
option works. Are you planning to do the testing yourself? If not, I can give it a shot with standard CTSM test suite (aux_clm
) and let you know how it goes.
So the "-V" sends all of the environment variables to the batch job. It looks like the da_no_datamod.sh script doesn't use any env variables (other than one it extracts with xmlquery which is fine). I would think that the only env variables would be in the env*.xml files and handled similarly or that the cime case logic would get whatever it needs from the case before it uses it. So it actually might be good to run this way to make sure things are set where they need to be rather than assumed from the batch system.
So it seems likely that removing "-V" should work. But, the change is universal for all PBS systems which means potentially it could break on a different system while working on Derecho. But, yes something that should go through quite a bit of testing...
@ekluzek If you look at the PR you will see that I only made this change for ncar systems.
Unfortunately this doesn't happen every time. The end of the logfile says something like this:
Resubmitting usually fixes it, but sometimes it takes a few tries.
See, e.g., this log file: