Closed dotsdl closed 7 years ago
I may poke at this this weekend, since I'm gearing up to use GromacsWrapper extensively again.
As we just discussed, definitely a step in the right direction, I'd very much like to see something like this. It's going to be a bit of a hassle to convert existing cfg files but a user only has to do this once (and maybe we can even cook up a script to do this when the time comes).
For the name collisions I would just raise an error and have the user make the names unique within one version. (We can later think of other ways to make it work, e.g. allowing classes to carry "tags" around so that you can provide an additional kwarg to select the version of mdrun
that you configured. there's still the problem that you already mentioned that the API is suddenly defined in the configuration file but on the other hand this is already true because the pre- and postfixes of Gromacs commands are fully user configurable....)
I really like the idea of doing
import gromacs
and not having any gromacs tools loaded (which currently incurs a significant performance penalty because each tool is run to check it's available and to extract its doc string). When import gromacs
is cheap it's easier to use the other functionality such as the different fileformat parsers.
Ultimately, we could also look into looking for gromacs tools in the environment (say grompp
), parse its version string, and then use it to heuristically find a suitable configuration entry, something like
gromacs.use('auto')
Edited: I did not quite understand what you were getting at with the mixed version/mixed mpi/serial use case.
I don't really understand why you would want to have a mix of different gromacs versions, but isn't this already possible by changing which configuration file you load? Just having one for each Gromacs version? You can just do as @orbeckst suggested and just make the load file come after import.
I agree there is the issue of the API instability of GromacsWrapper tool names. They change based on which Gromacs version is used.
Just FYI, the jupyter projects have a similar set of issues with configurations, especially generating the initial default configurations. They use traitlets which I've been using in one of my own projects. It could be one option to get away from trying to have custom templates. It also supports aliasing traits (basically typed class properties).
tl;dr: Need to make a decision how much the user installation of Gromacs (names) should be reflected in GW tool names.
Let me comment on the points you edited out because they're good points:
I think the user should only have to specify the version, prefix command (gmx), suffix,
I forgot about single/double precision suffix. In 4.x this was suffixed to the tool name as in mdrun_d
or grompp_d
. How is this done in 5.x?
Only specifying the driver command and suffix be easier:
versions:
4.6.5:
serial:
driver: ""
serial_double:
driver: ""
suffix: "_d"
mpi:
driver: ""
suffix: "_mpi"
mpi_double:
driver: ""
suffix: "_d_mpi"
5.1.1:
serial:
driver: "gmx"
mpi:
driver: "gmx_mpi"
The translation to class names for 4.x is straightforward, if I have g_something
and g_something_d_mpi
then they become tools.G_something
and tools.G_something_d_mpi
(nevermind that _mpi
only makes sense for mdrun
IIRC...).
However, I am then at a loss how we translate this to class names for 5.x: Is the driver command always gmx
or gmx_suffix
? If so, we could extract the suffix and generate tool classes in the same fashion as for 4.x: gmx something
becomes tools.Something
(and tools.G_something
with #46) and gmx_mpi something
would become tools.Something_mpi
or gmx_d something
would be tools.Something_d
(and tools.G_something_d
).
It makes your scripts dependent on your local installation and it would be nice to be able to say we're abstracting this at least a little bit. But on the other hand, GW's primary purpose is just to wrap the local tools... so maybe we should just do that and take anything else as a bonus?
and if to use a separate command for mdrun.
Some people want to use mdrun
and mdrun_d
in the same workflow, or perhaps mdrun
and mdrun_mpi
(I am just using the 4.x naming here) so we should be able to accommodate this.
We should have a list of tool names for each version and generate a configuration file that way.
You're right that we could just generate the base tool names for most versions of Gromacs:
ls ...gromacs/bin
and gmx help commands
.I don't really understand why you would want to have a mix of different gromacs versions, but isn't this already possible by changing which configuration file you load? Just having one for each Gromacs version?
The discussion was not about mixing, say 4.6.5 with 5.1.1 but rather making, say, serial and mpi versions of the tools available in the same GW session. (Btw, currently it's really inconvenient to use different Gromacs versions because at the moment the only thing you can do is rename you ~/.gromacswrapper.cfg
file... organizing the configurations in a more structured and hierarchical fashion will make it a lot easier to do such switches; the INI format is not hierarchical and makes it very clumsy to represent something like version -> 4.6.5 -> tools...
)
On my system, each Gromacs environnent is loaded with "module load gromacs-xxx", Therefrom I imagine I can move the rigth config (cfg, templates, scripts, ...) in my home. A config file per version could be enought but why not have multiple versions in cfg, Ultimately GW will read the right section corresponding to your gromacs-xxx !?
Juste one remark, on my system there is 2 gmx_mpi, one for the AVX processors and an another for the AVX2 processors
@pslacerda suggested that by default we should just be autodetecting the standard gromacs tools (https://github.com/Becksteinlab/GromacsWrapper/pull/55#issuecomment-222567986) if the user sourced GMXRC or if the GMXRC was provided in the cfg file.
This would be as simple as
For Gromacs 5: parse gmx help commands
:
import subprocess
gmx = subprocess.Popen(['gmx', 'help', 'commands'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stderr, stdout = gmx.communicate()
tools = [line.split()[0] for line in stderr.split('\n')
if (line.strip() and line[0] == " " and line[:22].strip())]
For Gromacs 4 look for executable files in the Gromacs BINDIR (but note that this can produce unintended results when Gromacs is installed into a standard bin directory... suddenly you have /usr/bin/*
in your GromacsWrapper name space... not the end of the world but messy):
import os, environ
import glob
tools = [os.path.basename(fpath) for fpath in glob.glob(os.path.join(os.environ['GMXBIN'], "*"))
if os.path.isfile(fpath) and os.access(fpath, os.X_OK)]
Hi committers!
If we autodetect the tools we could reduce the size of config files and simplify the code. YAML is a neat and structured format but probably requires an external package. We can also use ini files as usual where the section titles indicate the nesting:
[Gromacs]
logfile = /path/to/logfile
GMXRC = /defaut/GMXRC
[Gromacs/5.0]
GMXRC = /usr/local/gromacs5/bin/GMXRC
[Gromacs/4.7]
GMXRC = /usr/local/gromacs4/bin/GMXRC
extra = /path/g_extra /path/g_other
Then 5.0 and 4.7 will override the defaults if the user choose them specifically. This same strategy can also be applied in YAML files as well. Also would be very nice to try to load the config file from the current and parent directory before attempt $HOME
,
Now there is also an helper function for retrieving the output of a process.
import subprocess
stdout = subprocess.check_output(['gmx', 'help', 'commands'])
Installing pyyaml is easily installed together with everything else so I am not too worried. And yaml just has better support for logical data structures. I dislike parsing information from e.g. headers – rather I use the right tool for the job.
I like a simplified cfg file and doing the auto-detection by default.
Btw, if traitlets seem to work better as suggested by @whitead in https://github.com/Becksteinlab/GromacsWrapper/issues/49#issuecomment-207570275 then we could try them instead of yaml. I just don't have any experience with them. What would a mock-up of a configuration look like with traitlets?
Just another way, maybe multiple versions can be used as different groups and the user choose the appropriate one using the API, defaulting for the first:
grp4.7 = grompp trjconv g_rms
grp4.7mpi = mdrun_mpi
grp4.7mpi_base = ; no needed
grp4.7mpi_suffix = _mpi
grp5.0 = grompp trjconv rms mdrun
grp5.0_base = gmx
grp5.0d = grompp trjconv rms mdrun
grp5.0d_base = gmx_d
; or grp5.0d_base = gmx
; and grp5.0d_suffix = _d
grp5.0mpi = mdrun
grp5.0mpi_base = gmx_mpi
gromacs.use_tool_group('grp5.0mpi')
And the default can also be group = grp5.0mpi
. Of course the same approach can be made using YAML instead of INI as you wrote before. Any way wee need to think about inquiring Gromacs 5 commands automatically
versions:
5.1.1:
serial:
base: "gmx"
mpi:
base: "gmx_mpi"
, which I don't have equivalent in plain INI, except maybe:
grp5.0 =
grp5.0_base = gmx
But if wee put all standard Gromacs 4 commands inside a code list variable and just test for the presence of them we can also remove the need to enumerate them all in the configuration file, leaving it for custom commands.
I've been looking at Configurable objects with traitlets.config and this seems worthwhile thinking about. As I understand it at the moment, the cfg file would then be a Python script that just sets a bunch of attributes on one big Gromacs
class, e.g.
c.Gromacs.paths.configdir = "~/.gromacswrapper"
c.Gromacs.paths.configfile = "~/.gromacswrapper.cfg"
c.Gromacs.paths.templatesdir = "${configdir}/templates"
# ...
c.Gromacs.release = "5.1.2"
c.Gromacs.GMXRC = "/usr/local/bin/GMXRC"
c.Gromacs.tools = ["gmx:mdrun", "gmx:grompp", "gmx:editconf", ...]
c.Gromacs.groups = ["tools"]
c.Gromacs.logging.filename = "gromacs.log"
c.Gromacs.logging.loglevel.console = "INFO"
c.Gromacs.logging.loglevel.file = "DEBUG"
This would configure a class named Gromacs
that could be used as a base class for most other classes.
One could perhaps implement multiple releases with
c.Gromacs.releases['serial_5.1.2'].release = "5.1.2"
c.Gromacs.releases['serial_5.1.2'].GMXRC = "/opt/packages/gromacs/versions/5.1.2/serial"
c.Gromacs.releases['mpi_5.1.2'].release = "5.1.2"
c.Gromacs.releases['mpi_5.1.2'].GMXRC = "/opt/packages/gromacs/versions/5.1.2/mpi/gnu"
c.Gromacs.releases['serial_5.1.2'].release = "4.6.6"
c.Gromacs.releases['serial_5.1.2'].GMXRC = "/opt/packages/gromacs/versions/4.6.6/serial"
There are probably better ways to organize things...
By the way, the config file can also be JSON.
This issue has become pretty low priority because since autodetecting works so well, the config file can be absent or very small.
I am just going to close this with a wont-fix until someone else resurrects it.
To make it possible to support multiple versions of gromacs installed on a single machine, as well as to make it easier to support custom user environments, we should switch from a INI-style config to a YAML configuration.
A portion of this config would go a long way in solving #48 and #26. For example, the schema for the tools available from various versions of gromacs could look like:
We would then probably require specifying which version one plans to use on import. So, one can do:
This does not solve the problem where the same name is present multiple times within the same config, such as for
mdrun
in5.1.1
above. At present the same class names are built regardless of the version used, so this would result in an exception.Unless we build class names with clear namespace differences (such as using the base given in the config), we can't get around this problem. Although possible, having a library that changes its API depending on the config is probably not the best of ideas.