flux-framework / flux-sched

Fluxion Graph-based Scheduler
GNU Lesser General Public License v3.0
84 stars 39 forks source link

add Autotools support to flux-sched project #115

Closed lipari closed 8 years ago

lipari commented 8 years ago

Replaced the Makefiles with Makefile.am's and added the Autotool support. Rather than clone the associated files from flux-core, the effort started from scratch using more current conventions. You can invoke autoreconf -i the first time and then autoreconf each time after that. An autogen.sh script was added to resemble the flux-core system. autogen.sh wraps autoreconf -i and does some cleanup.

Used the Autotool version available on our existing TOSS 2 systems. The plan is to update the tools to the TOSS 3 versions, once we abandon development on TOSS 2.

flux-sched is based on the flux-core library. flux-sched can be built against an installed flux-core package. However, flux-sched can be built against any side-installed flux-core by setting the PKG_CONFIG_PATH environment variable to the location of the side-installed lib/pkgconfig directory.

flux-sched requires additional utilities from flux-core that are not include in the flux-core library. Rather than copy/paste these extra utilities into flux-sched, we opted to point to the flux-core source files and developed a mechanism for identifying where those extra source files can be found.

The configure script offers one option and it is the means of identifying the location of the source to the utilities.

Addresses: https://github.com/flux-framework/flux-sched/issues/56 and https://github.com/flux-framework/flux-sched/issues/83

lipari commented 8 years ago

This is a preliminary PR for review. There were some minor changes to source code to that appeared under -Werror. If desired, I can pull those changes into a separate commit. Also, I will enable the silent mode once the dust settles.

grondo commented 8 years ago

Separating some of this work into other commits, if easy, would be helpful in a review. Thanks! I'm going to take a closer look, but this represents a heroic effort on your part. :+1:

grondo commented 8 years ago

Some quick comments: These are all really minor.

automake: warning: possible forward-incompatibility.
automake: At least a source file is in a subdirectory, but the 'subdir-objects'
automake: automake option hasn't been enabled.  For now, the corresponding output
automake: object file(s) will be placed in the top-level directory.  However,
automake: this behaviour will change in future Automake versions: they will
automake: unconditionally cause object files to be placed in the same subdirectory
automake: of the corresponding sources.
automake: You are advised to start using 'subdir-objects' option throughout your
automake: project, to avoid future incompatibilities.
garlick commented 8 years ago

Yeah this is great @lipari.

Note we just dumped META in pr flux-framework/flux-core#526. You might check out that change as it increases safety in the release generation process IMHO.

dongahn commented 8 years ago

Thanks @lipari!

@grondo makes good points. I will appreciate have break a single commit to smaller topical commits for review.

One quick comment. I think you can use EXTRA_DIST rule to "make dist" pick up some testing scripts and testing data files that are not automatically picked up. E.g., you can specify this rule in t/Makefile.am to distribute sharness scripts and testing data.

grondo commented 8 years ago

One quick comment. I think you can use EXTRA_DIST rule to "make dist" pick up some testing scripts and testing data files that are not automatically picked up. E.g., you can specify this rule in t/Makefile.am to distribute sharness scripts and testing data.

make distcheck is a good way to ensure dist tarball is properly formed. It will test make dist, build from the resulting tarball using a VPATH build (a good check for builddir vs srcdir), run make check and then make clean to ensure the builddir is properly cleaned.

lipari commented 8 years ago

@grondo makes good points. I will appreciate have break a single commit to smaller topical commits for review.

I'm conflicted on this. While it would be easy to divide the commits into separate packages, e.g. the rdl, resrc, sched, and simulator, none of these intermediate commits would compile until the last one. That seems wrong to me in that it prevents the use of git bisect. Unfortunately, this is a wholesale revamp of the flux-sched project as opposed to independent changes to existing functionality as typical PR's contain.

grondo commented 8 years ago

@lipari: Good point that splitting some of this out would break intermediate builds. (I was only suggesting that, when possible, small logical commits are easier for review). All you could have done was maybe convert to autotools in one commit without -Werror, update code to fix errors in compile, then add -Werror. I'm not sure the work is worth it in this case.

garlick commented 8 years ago

That's a good point although it could at least be broken up functionally like

Then at least you don't have to sift through all the deletions to find the new content in a review.

lipari commented 8 years ago

@lipari: Good point that splitting some of this out would break intermediate builds. (I was only suggesting that, when possible, small logical commits are easier for review). All you could have done was maybe convert to autotools in one commit without -Werror, update code to fix errors in compile, then add -Werror. I'm not sure the work is worth it in this case.

Ok, understood. And though I neglected to say this, PR 116 contained only the fixes to the source code that will eliminate the errors. The new push of this branch will not include any code mods.

@garlick, I will follow the plan you proposed.

garlick commented 8 years ago

Another thought is if you're bringing in chunks wholesale from another source (such as INSTALL, lua.m4, etc, add them in separate commits and cite the source in the commit message. That helps reviewers tell the difference between new work, work imported wholesale, and work imported then modified (in the latter case, modifications should be in commit(s) following an import commit).

dongahn commented 8 years ago

make distcheck is a good way to ensure dist tarball is properly formed. It will test make dist, build from the resulting tarball using a VPATH build (a good check for builddir vs srcdir), run make check and then make clean to ensure the builddir is properly cleaned.

Yes, good point. We should probably have distcheck as part of our CI as well. For this pass, though, I am okay if we just make sure all autotool targets like this are correct. Then this CI change can come later, which I expect to be pretty minor.

lipari commented 8 years ago

Ok, just re-pushed the branch that addresses the suggestions:

I experimented with adding the subdir-objects as an option to AM_INIT_AUTOMAKE, but it resulted in automake/libtool trying to deposit intermediate objects under the /usr/lib64/flux-core/build/src directories. So, I will revisit this another day, if/when it becomes a problem.

I experimented with replicating the PACKAGE_VERSION and PACKAGE_NAME system from flux-core described above in PR 526, but I will defer that work to the subsequent effort to create a flux-sched distribution.

There are still some issues to work out (e.g., Issue 545) when building for an installed flux on TOSS 3, so there will be another pass at this once the issues are resolved.

garlick commented 8 years ago

Very nice improvement in the structure of the PR!

Could you get away with just configuring czmq not zeromq? Maybe just a one liner in your configure.ac, e.g.

PKG_CHECK_MODULES([CZMQ], [libczmq >= 3.0.0])

I don't think you're using zeromq directly, correct?

lipari commented 8 years ago

@garlick, you are correct. Just pushed a rebase that replaces the X_AC_ZEROMQ macro with your suggested PKG_CHECK_MODULES([CZMQ]). I removed the no-longer-necessary m4/x_ac_zeromq.m4 as well.

garlick commented 8 years ago

On AC_INIT, we updated the one we were using in flux-core to:

AC_INIT([flux-core],
        m4_esyscmd([git describe --always | awk '/.*/ {printf "%s",$1; exit}']))

--always was added so that it generates something even if there are no tags in the repo. Maybe that was what was preventing you from using it? See flux-framework/flux-core@167676689d52863f4cf5f1e77d47baaa895bd182

lipari commented 8 years ago

Maybe that was what was preventing you from using it?

It was indeed. Just pushed a new configure.ac with the suggested AC_INIT arguments.

garlick commented 8 years ago

running autogen, I hit this:

simulator/Makefile.am:72: `%'-style pattern rules are a GNU make extension

Are these make convenience targets to start flux and load modules even needed? They seem vaguely inappropriate in automakefiles.

I'd suggest moving this block to its own m4 file (defined as a macro), as this may be useful to steal for other framework projects that depend on core. It would be nice to avoid hardwiring library paths and library names here, but I'm not sure what the right approach is.

#
# Use the specified path to flux-core if it has been given.
# Otherwise, look to the installed locations.
#
AC_ARG_WITH([flux-core],
    [AS_HELP_STRING([--with-flux-core=[path_to_src]],
        [specify location of flux-core project])],
    [FLUX_BUILDDIR=${withval} FLUX_SRCDIR=${FLUX_BUILDDIR}
     FLUX_CORE_CFLAGS="-I${FLUX_SRCDIR} -I${FLUX_SRCDIR}/src/include"
     FLUX_CORE_LIBS="-L${FLUX_BUILDDIR}/src/common/.libs -lflux-core \
      -L${FLUX_BUILDDIR}/src/modules/libjsc/.libs -lflux-jsc \
      -L${FLUX_BUILDDIR}/src/modules/kvs/.libs -lflux-kvs"
     FLUX="${FLUX_BUILDDIR}/src/cmd/flux"
     AC_SUBST(FLUX_CORE_CFLAGS)
     AC_SUBST(FLUX_CORE_LIBS)
    ],
    [PKG_CHECK_MODULES([FLUX_CORE], [flux-core],
      [FLUX_BUILDDIR=/usr/lib64/flux-core/build
       FLUX_SRCDIR=/usr/lib64/flux-core/build
       FLUX=/usr/bin/flux
      ],
      AC_MSG_ERROR([No installed flux package or --with-flux-core specified]))
    ]
)

AC_SUBST(FLUX_BUILDDIR)
AC_SUBST(FLUX_SRCDIR)
AC_SUBST(FLUX)
garlick commented 8 years ago

I think setting AC_CONFIG_AUX_DIR([config]) will prevent autom4te.cache, config.h, etc from being created at the root level, cluttering things up a bit.

lipari commented 8 years ago

I think setting AC_CONFIG_AUX_DIR([config]) will prevent autom4te.cache, config.h, etc from being created at the root level, cluttering things up a bit.

I don't follow you. The AC_CONFIG_AUX_DIR([config]) is the same thing used in flux-core's configure.ac. Furthermore, after running flux-sched's autogen.sh script, I see a root directory populated with (mostly) the same contents as flux-core. The config/ directory is not present initially, but autoreconf -i creates and populates it.

garlick commented 8 years ago

Sorry, mainly I wanted to call your attention to the auto* products left at the root level which might be tidier to move to config. I think maybe the AC_CONFIG_AUX_DIR setting is only part of the solution; you can see the flux-core autogen.sh script does some tidying:

echo "Running aclocal ... "
aclocal -I config
echo "Running libtoolize ... "
libtoolize --automake --copy
echo "Running autoheader ... "
autoheader
echo "Running automake ... "
touch ChangeLog
automake --copy --add-missing
echo "Running autoconf ... "
autoconf
echo "Cleaning up ..."
mv aclocal.m4 config/
rm -rf autom4te.cache
echo "Now run ./configure."
lipari commented 8 years ago

Understood. When I created the flux-sched version of autogen.sh to wrap autoreconf -i, I "borrowed" the same clean-up measures from flux-core's autogen.sh:

echo "Running autoreconf -i ... "
autoreconf -i
echo "Cleaning up ..."
mv aclocal.m4 config/
rm -rf autom4te.cache
echo "Now run ./configure."

Note that flux-sched's .travis.yml executes autoreconf -i directly, not autogen.sh.

garlick commented 8 years ago

I think I had myself confused back there, sorry! Never mind.

lipari commented 8 years ago

Ok. Latest push has cleaned up Makefile.am files and the flux-core macros moved into a new ax_flux_core.m4 file. Thanks for all your suggestions, BTW.

garlick commented 8 years ago

This "else" leg that runs if the user doesn't configure --with-flux-core=srcdir really shouldn't hardwire these paths

  [PKG_CHECK_MODULES([FLUX_CORE], [flux-core],
       [FLUX_BUILDDIR=/usr/lib64/flux-core/build
        FLUX_SRCDIR=/usr/lib64/flux-core/build
        FLUX=/usr/bin/flux
       ],

I think you could find FLUX with AC_CHECK_PROG.

I'm not too sure what to do about the others.

lipari commented 8 years ago

I'm not too sure what to do about the others.

I just pushed an update with AC_CHECK_PROG(FLUX) replacing the hard coded FLUX assignment. I'm inclined to defer action on the others until a decision is made on how to address Issue 545

garlick commented 8 years ago

This is looking pretty nice. DId you miss removing the start and dump targets from rdl/Makefile.am or were those left in on purpose?

lipari commented 8 years ago

DId you miss removing the start and dump targets from rdl/Makefile.am or were those left in on purpose?

Ah, that was an oversight. Rebased branch just pushed.

garlick commented 8 years ago

Why does configure run twice here? (once when I run it outright, again when I run make)?

$ module load flux-core 
$ ./autogen.sh
Running autoreconf -i ... 
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, `config'.
libtoolize: copying file `config/ltmain.sh'
configure.ac:24: installing `config/compile'
configure.ac:15: installing `config/config.guess'
configure.ac:15: installing `config/config.sub'
configure.ac:11: installing `config/install-sh'
configure.ac:11: installing `config/missing'
rdl/Makefile.am: installing `config/depcomp'
Cleaning up ...
Now run ./configure.
$ ./configure
[snip]
config.status: executing depfiles commands
config.status: executing libtool commands

  flux-sched version 9e8b56a
  Prefix...........: /usr/local
  Debug Build......: 
  C Compiler.......: gcc -std=gnu99
  CFLAGS...........: -g -O2
  CPPFLAGS.......... 
  FLUX.............: flux
  FLUX_BUILDDIR....: /usr/lib64/flux-core/build
  FLUX_CORE_CFLAGS.: -I/opt/flux-core/include  
  FLUX_CORE_LIBS...: -L/opt/flux-core/lib/flux -ljsc -llive -lmodctl -lmrpc -lkvs -lbarrier -lflux-core  
  FLUX_SRCDIR......: /usr/lib64/flux-core/build
  LDFLAGS..........: 
  LIBS.............: -lxml2 -luuid -lhwloc 
  Linker...........: /usr/bin/ld -m elf_x86_64

$  make
CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/sh /home/garlick/proj/flux-sched/config/missing --run aclocal-1.11 -I m4
 cd . && /bin/sh /home/garlick/proj/flux-sched/config/missing --run automake-1.11 --gnu
CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/sh /home/garlick/proj/flux-sched/config/missing --run autoconf
/bin/sh ./config.status --recheck
running CONFIG_SHELL=/bin/sh /bin/sh ./configure --no-create --no-recursion
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
[snip]
garlick commented 8 years ago

Incidentally, building against the installed flux-core/0.1.0, the build fails due to the hardwired FLUX_SRCDIR (or FLUX_BUILDDIR) as that should be /opt/flux-core/build on this system (TOSS 2/RHEL6)

$ make
make  all-recursive
make[1]: Entering directory `/home/garlick/proj/flux-sched'
Making all in rdl
make[2]: Entering directory `/home/garlick/proj/flux-sched/rdl'
Making all in .
make[3]: Entering directory `/home/garlick/proj/flux-sched/rdl'
  CC     cpuset_la-lua-cpuset.lo
  CC     cpuset_la-cpuset-str.lo
  CCLD   cpuset.la
  CC     rdl.lo
rdl.c:39:36: error: src/common/liblsd/list.h: No such file or directory
rdl.c:40:39: error: src/bindings/lua/json-lua.h: No such file or directory
rdl.c:41:41: error: src/common/libutil/xzmalloc.h: No such file or directory
garlick commented 8 years ago

You are doing some nice work here @lipari. I am leaving for the day but I'll try to keep on top of it over the weekend if you continue to work on it, or if not there's always next week.

lipari commented 8 years ago

@garlick, it turns out that using the autoreconf script, aclocal.m4 needs to stay in the root directory. The configure script re-run was caused by the autogen.sh script moving aclocal.m4 it into the config directory. I thought of messing with environment variables to get around this, but I decided it does not introduce that much clutter that it bothers me. Perhaps someone else will know a clever way to move it and not trigger the configure script to re-run. In thinking more about the hard coded, installed source code locations, I decided to add another option to configure (via the ax_flux_core.m4 macro). --with-flux-install will allow you provide the location of the installed source tree, whether it is on TOSS 2 or 3. The downside to this is that now you have to provide an option to configure. That makes sense because you either need to tell it the location of the installed source tree or your cloned flux-core repo. Admittedly, there might be a clever way to automate this someday and find the associated external source files automagically. But for now, this will work. Rebased branch was pushed. And, I edited the PR description above to reflect these latest changes.

garlick commented 8 years ago

it turns out that using the autoreconf script, aclocal.m4 needs to stay in the root directory.

Good catch.

This is turning out nice, but --with-flux-core and --with-flux-install seem off. It seems like an installed flux-core should be required, and found in the usual way (without options), and the sources that have to be found in the installed build directory might require an option like --with-flux-core-build temporarily until we can find a way for installed flux-core to provide the path, provide those sources another way, or eliminate the dependency.

lipari commented 8 years ago

It seems like an installed flux-core should be required, and found in the usual way (without options), and the sources that have to be found in the installed build directory might require an option like --with-flux-core-build temporarily until we can find a way for installed flux-core to provide the path, provide those sources another way, or eliminate the dependency.

That's actually what happens with the latest changes. When you specify --with-flux-install, it uses the PKG_CONFIG macro to get the CFLAGS (albeit null) and LIBs for flux-core. But that is insufficient for flux-sched which needs those other utility source files. And because we haven't come up with a rule on where to find those across TOSS distros, --with-flux-install is how to convey the location.

Admittedly, it is a short term solution until we wean flux-sched from those extra utilities. Then --with-flux-install can go away and just plain configure can use the installed flux-core libraries.

garlick commented 8 years ago

I guess then my comment boils down to

lipari commented 8 years ago

A brief recap of the two scenarios... The first scenario is building flux-sched against a cloned repo of flux-core. This is the way we have been building flux-sched since the beginning and required the .build link to point to where the flux-core tree can be found. This scenario will continue to be required whether or not an official flux-core package is installed on the machine.

The second scenario builds against an installed flux-core package. PKG_CONFIG macro and AC_CHECK_PROG will identify the flux-core libraries and the flux command, but flux-sched needs to pull in utilities not included in libflux-core. At the moment, there is no package that contains a library of these utilities and there is no rule for finding these source files. So, the developer needs to specify to configure where they reside on his/her machine.

The goal is to minimize the necessary options to configure. But for each of these scenarios above, a path needs to be supplied. And, unless we hard code the location of the source tree to the utilities, running configure alone with no options to build against the installed flux-core package will fail without the knowledge of where the source to the utilities is.

I'm fine with renaming the --with options to configure along the lines you propose. However, I don't see how we can avoid having to specify either the location of the cloned flux-core repo or the location of the source to the utilities.

garlick commented 8 years ago

This scenario will continue to be required whether or not an official flux-core package is installed on the machine.

Oh, I had assumed this would not be required anymore. Even if flux-sched is to be built against flux-core master in travis-ci or during development, it could still be built from a side installed location (e.g. ~/local with PKG_CONFIG_PATH set to point there, see travis-dep-builder.sh). If that's not done then it's hard to restrict sched to only the parts of flux-core that are exported. That's the path we want to be on IMHO.

garlick commented 8 years ago

Couple thoughts - appreciate your take on this @grondo:

grondo commented 8 years ago

since build is installed by the spec file, sched cannot be built from a side installed, unpackaged core. Should the build directory be created by make install?

I can see what you mean, but ewww, no. For now what is the problem with pointing flux-sched to the builddir from which you did make install? I thought this was temporary anyway. I guess you are considering a case where a site isn't using packaging, and only does make install?

Could we dispense with the --with-flux-core-builddir option if the installed build location were made available as a broker attribute? (Somehow that feels wrong but I cant't say exactly why so maybe not?)

That feels really wrong, but it does have some nice properties. If nothing else is wrong with it though, it doesn't seem like we should assign a broker attribute to some temporary workaround... I could be just being a stick in the mud though.

It sure would be interesting though to require flux-framework projects to be built under a flux instance.. kind of like that old flux get idea...

grondo commented 8 years ago

I don't see such a problem with requiring ./configure --with-flux-core-builddir= for now. This will be a pain point until we can pull common code into flux-sched, export interfaces from flux-core, or find some other solution. Since it will bug people we'll be more likely to look for solutions. Once we have a solution, we can just remove the --with-flux-core-builddir option from the flux-sched build and put it behind us.

lipari commented 8 years ago

FWIW, the change I'm currently testing eliminates the --with-flux-core= option as Jim suggested. Developers will have the option of:

garlick commented 8 years ago

Heh, I thought you might have an opinion :-)

I guess you are considering a case where a site isn't using packaging, and only does make install

I was urging @lipari to build flux-sched in travis by checking out the flux core master, and installing it to $HOME/local like the other dependencies.

lipari commented 8 years ago

Ok, the latest push successfully instructs travis to build and install the flux-core master branch and then build flux-sched against the installed library. It copies the all the flux-core source files into a local directory and passes this directory as the argument to configure --with-flux-core-builddir

The lua tests are currently commented out. I will next integrate the changes @grondo has provided in PR 554

garlick commented 8 years ago

It would be good to split out into standalone commits:

garlick commented 8 years ago

Looking good. I did just hit this in make check. I assume you meant you had commented out the failing tests in your working copy?

make[3]: Entering directory `/home/garlick/proj/flux-sched/rdl/test'
loading RDL: /home/garlick/proj/flux-sched/rdl/RDL.lua:30: module 'flux.hostlist' not found:
        no field package.preload['flux.hostlist']
        no file '/home/garlick/proj/flux-sched/rdl/flux/hostlist.lua'
        no file '/home/garlick/proj/flux-sched/../flux-core/../share/lua/5.1/flux/hostlist.lua'
        no file './flux/hostlist.lua'
        no file '/usr/share/lua/5.1/flux/hostlist.lua'
        no file '/usr/share/lua/5.1/flux/hostlist/init.lua'
        no file '/usr/lib64/lua/5.1/flux/hostlist.lua'
        no file '/usr/lib64/lua/5.1/flux/hostlist/init.lua'
        no file '/home/garlick/proj/flux-sched/rdl/.libs/flux/hostlist.so'
        no file '/home/garlick/proj/flux-sched/../flux-core/../lib64/lua/5.1/flux/hostlist.so'
        no file './flux/hostlist.so'
        no file '/usr/lib64/lua/5.1/flux/hostlist.so'
        no file '/usr/lib64/lua/5.1/loadall.so'
        no file '/home/garlick/proj/flux-sched/rdl/.libs/flux.so'
        no file '/home/garlick/proj/flux-sched/../flux-core/../lib64/lua/5.1/flux.so'
        no file './flux.so'
        no file '/usr/lib64/lua/5.1/flux.so'
        no file '/usr/lib64/lua/5.1/loadall.so'
lt-trdl: rdllib_open: No such file or directory
FAIL: trdl
garlick commented 8 years ago

Never mind! I reconfigured against the installed builddir and it works fine. (I had configured against a checked out source dir)

lipari commented 8 years ago

Ok. I am going to be taking another pass that the LUA paths, so things will change slightly...

lipari commented 8 years ago

It would be good to split out into standalone commits: remove old makefiles update travis.yml

Done.

garlick commented 8 years ago

Just ran a quick test with a "side installed" flux-core and things seem to be good at least to get sched built and checked!

# build flux-core
BUILDDIR=$HOME/local/lib/flux-core/build
./configure --prefix=$HOME/local
make
make install
mkdir -p $BUILDDIR
make clean
cp -r . $BUILDDIR

And then

# build flux-sched
export LD_LIBRARY_PATH=$HOME/local/lib
export CPPFLAGS=-I$HOME/local/include
export LDFLAGS=-L$HOME/local/lib
export PATH=$HOME/local/bin:$PATH

export PKG_CONFIG_PATH=$HOME/local/lib/pkgconfig
export BUILDDIR=$HOME/local/lib/flux-core/build

./configure --with-flux-core-builddir=$BUILDDIR --prefix=$HOME/local
make
make check

There are some issues with the install target which I'll comment on shortly.

garlick commented 8 years ago

Doing a make install after building sched with --prefix=$HOME/local I note:

(*) We might need to add this directory to FLUX_EXEC_PATH