choderalab / ambermini

A stripped-down set of just antechamber, sqm, and tleap.
Other
31 stars 19 forks source link

How to handle AmberMini Development going forward #38

Open Lnaden opened 8 years ago

Lnaden commented 8 years ago

37 showed that the current way of managing AmberMini is not sustainable and we will need a new strategy.

Based on feedback from #37, it should be possible to compile the parts of AmberTools that we want, create wrappers for the executables we want, and ship that. However, the tests are less simple to bundle this way for the following reasons:

  1. Inconsistent test structure, likely due to years of AmberTools dev and old tests never being updated.
  2. Reliance on the $AMBERHOME variable to find file structure
  3. Reliance on the config.h file created from the ./configure command at standard AmberTools install

I think we can use the above scheme and address the issues above with the following items (not 1-to-1 with above)

  1. Create a compile script which builds only the parts we want and would be the major script updated for new versions of AmberMini
  2. Create wrappers for the executables we need to have an OS agnostic platform which also handle at-run-time assignment of $AMBERHOME
  3. Carry the config.h file from the compiled versions of AmberTools when updating AmberMini into the test directory and have all tests reference it.

This is not something that has to be fixed "right now" but it is something that should be addressed by AmberTools 17, or whatever the next major version is.

Other options are welcome (such as coordinating an official AmberLite)

hainm commented 8 years ago
  1. Inconsistent test structure, likely due to years of AmberTools dev and old tests never being updated.

Just FYI: @dacase and I had a brief chat and we agreed that it's a good idea to move tests folder to corresponding package, just like we did for cpptraj, parmed or pytraj.

We very welcome new idea to organize AmberTools for better distribution with different aims.

jchodera commented 8 years ago

Awesome!

Maybe it's worth Levi and I swinging by to visit someday to chat about possibilities?

hainm commented 8 years ago

yeah. agree. from Dave comment: https://github.com/choderalab/ambermini/pull/37#issuecomment-255828285

hainm commented 8 years ago

Per AMBERHOME discussion,

I bring @swails comment from https://github.com/choderalab/ambermini/pull/37#issuecomment-255774559

The main idea behind ambermini is that it provides back-end engines for doing some small molecule parametrization tasks. It's an implementation detail, which is why eliminating the need for setting AMBERHOME was considered important.

Users of the tools depending on ambermini should be considered ignorant of Amber altogether, so setting the variable AMBERHOME is confusing in this context (and difficult to do "correctly" in an automated capacity where installation locations may change). That's why I bothered to make an environment variable-free scheme to run these tools (using wrapper scripts that set the variable for the underlying program based on the path of $0).

But it may be hard to convince Amber developers that an AMBERHOME-less execution mode is worth supporting.

It's still not clear to me why it's not a good idea to set AMBERHOME with ambermini For example, if user wants to use Python script, he/she can do

import os, sys
os.environ['AMBERHOME'] = sys.prefix
os.system('tleap')

If using bash

export AMBERHOME=`python -c 'import sys; print(sys.prefix)`
[do something]

May be it's not good for non-conda user?

tag @dacase so you can follow this thread.

swails commented 8 years ago

You are coming at this from the perspective of an Amber user using one-off scripts -- I'd argue that the target audience of ambermini are non-Amber users. The purpose is to have it be a back-end dependency that gets used behind-the-scenes. An API that requires setting environment variables is going to be fragile. To have a library silently change an environment variable that you may wind up depending on has the potential to be a nasty source of bugs and frustration -- and that probability only increases as the complexity of the consumer application increases (I have simple real-world examples where this would cause subtle and unexpected errors involving using MMPBSA.py's API which depends on AMBERHOME alongside a tool that uses an ambermini package that silently changes it).

Back-end services should generally try to avoid requirements that bleed into the global namespace (like environment variables). It's an obvious source of not only obscure bugs, but security vulnerabilities.

So while setting AMBERHOME works fine for users of Amber, relying on these environment variables when trying to build out some of Amber's tools to work as general back-end services will frustrate attempts to use it in complex (and clever) ways.

But Amber development has limited resources, which needs to be taken into account when allocating resources for tasks like these.

tl;dr These AMBERHOME workarounds are fine if it's a temporary band-aid to bridge the gap until the Amber components can be completely replaced. If we want to supply these Amber components as general back-end tools, the AMBERHOME requirement really should be eliminated, and in a better way than how I did it here.

hainm commented 8 years ago

You are coming at this from the perspective of an Amber user using one-off scripts -- I'd argue that the target audience of ambermini are non-Amber users

ah, you're right. (Just thinking about if I need to use $CHARMMHOME or $GROMACSHOME :D)

hainm commented 8 years ago

@ all:

So I just give a try to build whole AmberTools16 with python3.5, conda, docker image from conda-forge guys

conda install -c hainm ambertools=16

# 66 MB, 1 hour build on travis and circleci
# I included only bin/, dat/, lib/ folders
$ cat run_docker_build.sh
#!/usr/bin/env bash

# PLEASE NOTE: This script was adapted from output generated by conda-smithy.

# tar -xf A.tar.bz2
AMBER16=`pwd`/amber16
ambertools_bz2file=/opt/conda/conda-bld/linux-64/ambertools*bz2
# cp -rf recipe-prebuild $AMBER16/

FEEDSTOCK_ROOT=$(cd "$(dirname "$0")/.."; pwd;)

docker info

config=$(cat <<CONDARC

channels:
 - defaults # As we need conda-build

 - conda-forge

conda-build:
 root-dir: /feedstock_root/build_artefacts

show_channel_urls: true

CONDARC
)

cat << EOF | docker run -i \
                        -v ${AMBER16}:/amber16 \
                        -v ${FEEDSTOCK_ROOT}:/feedstock_root \
                        -a stdin -a stdout -a stderr \
                        condaforge/linux-anvil \
                        bash || exit $?

export PYTHONUNBUFFERED=1

echo "$config" > ~/.condarc
# A lock sometimes occurs with incomplete builds. The lock file is stored in build_artefacts.
conda clean --lock

conda update --yes --all
conda install --yes conda-build
conda info

yum -y install csh flex wget perl

# Embarking on 1 case(s).
    # conda build /feedstock_root/test-docker/amber16/recipe-prebuild --quiet || exit 1
    cd /amber16/
    conda build recipe-prebuild --quiet || exit 1
    cp $ambertools_bz2file /feedstock_root/test-docker/
EOF
hainm commented 8 years ago

oops, for large AT tar file, I used git lfs

dacase commented 8 years ago

On Fri, Oct 28, 2016, Jason Swails wrote:

Back-end services should generally try to avoid requirements that bleed into the global namespace (like environment variables).

I think we all agree on what behavior we'd like to get rid of, but I'm still uncertain about what behavior we want in its place. Should we:

a. hard-wire the location of libraries into the executables at install time?

or

b. make use of "$0" (or similar) to figure out at run time where we are in the filesystem?

or

c. some other mechanism?

...dac

swails commented 8 years ago

I suspect that b makes the most sense, although a might be the easiest to implement. The typical workflow of installation is something along the lines of

./configure --prefix=/path/to/desired/install/location <options>
make all # makes a local copy in a local build directory
make test # should work on the locally-built copy
make install # moves libraries, headers, programs, and data files to lib, include, bin, and share

The local copy should still "work", since you want to be able to isolate your test environment (without having to go through the trouble of setting up some kind of VM or container to do it in). This rules out hard-coding it through some sort of preprocessor macro set by configure (which is how I'd implement a). The only thing left that really makes sense that I can think of is to go based on argv[0] or something of that ilk (option b).

It might help to enumerate the programs that depend on $AMBERHOME to begin with, just to get a feel for how big of a job this would end up being. To get the list started:

  1. tleap
  2. xleap
  3. MMPBSA.py
  4. antechamber
  5. related antechamber programs, like bondtype, parmchk2, etc.
  6. nab (I think?)
  7. charmmlipid2amber.py
  8. update_amber
hainm commented 8 years ago

9. cpinutil.py