easybuilders / easybuild-framework

EasyBuild is a software installation framework in Python that allows you to install software in a structured and robust way.
https://easybuild.io
GNU General Public License v2.0
148 stars 201 forks source link

Syntax error when specifying a modtclfooter that contains 'puts stdout XXX' #3020

Open casparvl opened 4 years ago

casparvl commented 4 years ago

Issue

I'm trying to create a Gaussian EasyConfig which is inspired by https://github.com/easybuilders/easybuild-easyconfigs/pull/7553/files

However, rather than setting things like GAUSS_EXEDIR and GAUSS_BSDDIR explicitly in the EasyConfig, I'd prefer to follow the Gaussian installation instructions as closely as possible: they tell me to source (for bash) $env(EBROOTGAUSSIAN)/g16/bsd/g16.profile. (yes, I realize and accept that the downside of sourcing is that a module unload cannot undo it)

Our old-fashioned, manually created modules would contain something of this nature:

if { [module-info mode] == "load" } {
  if { [ module-info shelltype ] == "csh" } {
    puts stdout "source $env(EBROOTGAUSSIAN)/g16/bsd/g16.login ;"
  } else {
    puts stdout ". $env(EBROOTGAUSSIAN)/g16/bsd/g16.profile ;"
  }
}

I've tried to recreate the same by specifying this as modtclfooter. EasyBuild now generates a perfectly valid module file (which in fact, I can module load just fine), but the EasyBuild installation procedure crashes during the check of the module file:

== 2019-09-20 18:21:58,646 build_log.py:163 ERROR EasyBuild crashed with an error (at ?:124 in __init__): Changing environment as dictated by module failed: invalid syntax (<string>, line 1) (stdout: source /home/casparl/.local/easybuild/RedHatEnterpriseServer7/2019/software/Gaussian/g16.b01/g16/bsd/g16.profile ;
import os
os.environ['EBVERSIONGAUSSIAN'] = 'g16.b01'
os.environ['LD_LIBRARY_PATH'] = '/home/casparl/.local/easybuild/RedHatEnterpriseServer7/2019/software/Gaussian/g16.b01/bsd:/home/casparl/.local/easybuild/RedHatEnterpriseServer7/2019/software/Gaussian/g16.b01/.'
os.environ['GAUSS_SCRDIR'] = '/scratch-shared/casparl/eb-1zSQIG'
...

What seems to be happening (looking at the 'import os' command there) is that EasyBuild is trying to change the environment in a Python interpreter context. Indeed, when I replace the puts stdout by:

puts stdout "import os"

EasyBuild quite happily passes the 'sanity' check on the modulefile - I expect because in that case, it actually evaluates

import os
import os
os.environ['EBVERSIONGAUSSIAN'] = 'g16.b01'

in a Python context, which is perfectly valid. Of course, I don't have to tell you that the modulefile will actually now be invalid syntax, since it will evaluate the 'import os' in a bash shell. It's quite funny that EasyBuild crashes on a perfectly valid module syntax, but succeeds on a perfectly invalid one =)

Potential workaround Here's where I'm stuck. The generated module file is perfectly valid tcl syntax. But, EasyBuild crashes on it. I'd like to be able to workaround this by convincing EasyBuild to 'skip' the step in which it tries to change the environment 'as dictated by the modulefile', so that it doesn't run into this error of evaluating tcl syntax in a Python interpreter... Is that somehow possible?

Potential solution Ooof, even more difficult. As a first dirty fix, I think EasyBuild should maybe simply not try to evaluate the mod***footer part. An improved, more complicated fix could somehow maybe evaluate this part strictly as tcl/lua syntax. I wouldn't know where to start in either case so I'm leaving this one up to the experts if you don't mind :)

boegel commented 4 years ago

@casparvl Please try again with the just released EasyBuild v4.0.0.

I may have (accidentally) fixed the problem you're reporting here in #3008, by skipping source statements that are emitted when loading modules (in EasyBuild).

So, when EasyBuild does a test load of the generated module, it will filter out the source statement you've added via modtclfooter from the output of lmod python load, and the load test should pass. When a user loads the module using module load however, it should work as expected (incl. the source).

(I'll move this issue to the easybuild-framework repo, since it's better located there).

casparvl commented 4 years ago

Haha, doesn't this just fit perfectly with your 'this is the best release of EasyBuild yet'? This release even contains bug fixes for bugs you weren't yet aware of xD

I'll see if I can update our EasyBuild installation, give this a try, and let you know the result :)

boegel commented 4 years ago

@casparvl Well, what can I say, I don't say "best release ever" lightly... ;)

casparvl commented 4 years ago

The sanity check problem is resolved with EB-4.0.0! (Gaussian is not working yet, but that's on me to fix the EasyConfig =)).

One little detail: it is resolved when I call source <somefile>, but not when I do . <somefile>. I'm guessing you might not want to auto-replace periods, so it may not be something that can be resolved, but just FYI.

casparvl commented 4 years ago

Ok, reopening this...

It seems that EasyBuild now very specifically allows me to do things like

puts stdout "source <something>"

However, any other puts stdout will still create the aforementioned error. Now, we can have a discussion on whether puts stdout source <something> in a module even makes sense (it's not an environment change that can be undone - which is kind of the point of modules in the first place), but that's a different discussion.

Can't/shouldn't EasyBuild just filter all puts <something> lines from the modulefile before checking the syntax? Or, an improved solution would be that anything that reads puts stdout is actually evaluated in a shell, rather than in the Python interpreter - though I'm not sure if that shell would then have the right environment if it is invoked from within a Python interpreter...

N.B. On the sourcing discussion: I am tempted to keep sourcing, so that if Gaussian makes changes to their .profile, I don't have to make changes to my EasyConfigs in order to reflect them. I'm sure I/my colleagues would not check every time if Gaussian made changes and we would therefore run the risk of deviating from the run environment that Gaussian intended, which to me is a much bigger problem than an irreversible environment change.

boegel commented 4 years ago

@casparvl Which other puts statements are you using?

Keep in mind that all of those will be executed by bash, since the modules tool basically takes the stdout output that is produced by loading the module and executes it as shell statements...

While source is easy to filter out, filtering out other stuff is difficult since genuine export statements (via os.environ[...] = ...) are there too, etc.

casparvl commented 4 years ago

Sourcing the g16.login and g16.profile files only properly works if the g16root and PATH have been set correctly. Our old (manually created) modules would do:

      puts stdout "export g16root=$root ;"
      puts stdout "export PATH=\$PATH:$root/g16 ;"
      puts stdout "export PATH=\$PATH:$root/g16/bsd ;"
      puts stdout ". $root/g16/bsd/g16.profile ;"

In the end, I worked around the issue with some rather ugly postinstallcmds that alter the .profile so that these exports are now prepended there...

# First, remove g16 as directory name from .profile and .login files
postinstallcmds = ["sed -i 's/\/g16//g' %(installdir)s/bsd/g16.profile;"]
postinstallcmds += ["sed -i 's/\/g16//g' %(installdir)s/bsd/g16.login;"]
# Then, set the g16root, and the PATH correctly by prepending
postinstallcmds += ['echo -e "g16root=%(installdir)s\nPATH=\"\$PATH:%(installdir)s:%(installdir)s/bsd\"\n$(cat %(installdir)s/bsd/g16.profile)" > %(installdir)s/bsd/g16.profile']
postinstallcmds += ['echo -e "g16root=%(installdir)s\nsetenv PATH \"\${PATH}:%(installdir)s:%(installdir)s/bsd\"\n$(cat %(installdir)s/bsd/g16.login)" > %(installdir)s/bsd/g16.login']

In that way, our Gaussian installation now works. As said, it is a bit ugly to mess with those files before sourcing them though. In addition, from a fundamental point of view, it is a bit ugly that EasyBuild fails a sanity check on a module file for which the syntax is perfectly valid...

I'm not sure I understand why other puts commands would be more difficult to filter out though. I mean, can't you just parse the full syntax (which includes all of these os.environ statements) and simply remove anything that starts with puts? It means that those parts are not validated as part of the sanity check, but in my opinion that is better than having the sanity check fail on them.

akesandgren commented 4 years ago

FYI, for gaussian itself the only thing you need in the module is (assuming the easyconfig uses easyblock=Tarball) wthat this generates.

start_dir = '..'

postinstallcmds = [
    'mkdir %(installdir)s/bin',
    'ln -s ../g%(version_major)s/g%(version_major)s ../g%(version_major)s/ghelp %(installdir)s/bin',
]

modextravars = {
    'g%(version_major)sroot': '%(installdir)s',
    'G%(version_major)sBASIS': '%(installdir)s/g%(version_major)s/basis',
}

modextrapaths = {
    'GAUSS_EXEDIR': ['g%(version_major)s', 'g%(version_major)s/bsd'],
    'PATH': ['g%(version_major)s/bsd', 'g%(version_major)s'],
    'LD_LIBRARY_PATH': 'g%(version_major)s',
}

And the only reason we make a specific bin directory is that we put a helper script in there to assist users in fixing their gaussian input file to match the actual allocation on the cluster.

(We don't use linda so that might change what you really need)

xdelaruelle commented 10 months ago

Initial issue may have been avoided by not producing the output when module tool is called to generate python code. Sourcing these shell scripts is only valid for sh or csh, so the following code will help Easybuild not to find non-python in the generated output:

if {[module-info mode] eq {load}} {
  switch -- [module-info shelltype] {
    csh {puts stdout "source $env(EBROOTGAUSSIAN)/g16/bsd/g16.login;"}
    sh  {puts stdout ". $env(EBROOTGAUSSIAN)/g16/bsd/g16.profile ;"}
  }
}

In addition, please note a new modulefile command has been introduced for such use case: source-sh. It was introduced in EnvironmentModules 4.6 and it is also available starting version 8.6 of Lmod (it is named source_sh if one use Lua modulefile syntax).

source-sh sh $env(EBROOTGAUSSIAN)/g16/bsd/g16.profile

This modulefile command executes the shell and script defined as argument, compares the environment changes produces by the script (environment variables, alias, function, chdir, completion) then generates the corresponding modulefile command to perform such changes when loading the modulefile.

It is a big improvement compared to puts stdout method as changes:

This last point helps to produce python code when Easybuild evaluates this modulefile.