flux-framework / flux-sched

Fluxion Graph-based Scheduler
GNU Lesser General Public License v3.0
84 stars 39 forks source link

Questions that may arise in creating the first sched RPM package #154

Closed dongahn closed 7 years ago

dongahn commented 8 years ago

I talked with Mark offline this morning. I thought his suggestion was good to create an issue and track questions/issues that I may come across as I will be creating the first sched RPM and a release for sched for /opt on our systems.

dongahn commented 8 years ago

It doesn't look like the current sched master builds against the flux-core 0.2.0 rpm installed under /opt on our systems like cab...

cab668{dahn}83: ./configure --prefix=/nfs/tmp2/dahn/FLUX-20160421/inst
cab668{dahn}84: make

<CUT>

make[2]: Entering directory `/nfs/tmp2/dahn/FLUX-20160421/flux-sched/sched'
  CC     sched_la-sched.lo
  CC     sched_la-rs2rank.lo
  CC     sched_la-rsreader.lo
  CC     sched_la-plugin.lo
plugin.c: In function 'lsmod_cb':
plugin.c:248: error: 'FLUX_MODSTATE_RUNNING' undeclared (first use in this function)
plugin.c:248: error: (Each undeclared identifier is reported only once
plugin.c:248: error: for each function it appears in.)
plugin.c:248: error: too many arguments to function 'flux_modlist_append'
make[2]: *** [sched_la-plugin.lo] Error 1
make[2]: Leaving directory `/nfs/tmp2/dahn/FLUX-20160421/flux-sched/sched'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/nfs/tmp2/dahn/FLUX-20160421/flux-sched'
make: *** [all] Error 2
grondo commented 8 years ago

Ah yes, that was added in flux-core/d3f1cd31f8745a67944b8bd916c6f6c736346231 which was applied after 0.2.0 tag. For build testing you might have to first install a copy of current flux-core master to you destdir. Sorry!

dongahn commented 8 years ago

FYI -- I've made some progress here. I will probably need to take care of some stuff next few days and then resume this work towards the end of this week or early next week. If /opt flux-core 0.3.0 comes along, the first sched 0.1.0 rpm will be tested on that. The first sched /opt rpm should be considered as a rpm for rpm packaging testing sake.

Here are my current (completely untested) flux-sched.spec file and module.flux-sched file for TOSS2:

Name: flux-sched
Version: 0.1.0
Release: 1%{?dist}
Summary: Job Scheduling Facility for Flux Resource Manager Framework
Group: System Environment/Base
License: GPLv2+ 
URL: https://github.com/flux-framework/flux-sched 
Source0: %{name}-%{version}.tar.gz
Source1: module.flux-sched
BuildRoot: %{_tmppath}/%{name}-%{version}-root-%(%{__id_u} -n)
#let's not build the debug package for now 
%define debug_package %{nil}
#only compress -- no stripping etc 
%define __spec_install_post /usr/lib/rpm/brp-compress || :

BuildRequires: flux-core >= 0.3.0
BuildRequires: zeromq4-devel >= 4.1.4
BuildRequires: czmq-devel >= 3.0.2
BuildRequires: json-c-devel
BuildRequires: lua-devel >= 5.1
BuildRequires: lua-posix
BuildRequires: hwloc-devel >= 1.4

Requires: flux-core >= 0.3.0

%description
flux-sched contains the job scheduling facility for the Flux Resource
Manager Framework. It consists of a Flux comms module that handles
all the functionality common to scheduling. The module has the ability
to load one or more scheduling sub-modules that provide specific
scheduling behavior.

%prep
%setup -n %{name}-%{version}

sed -i -e "s|@NAME@|%{name}|" -e "s|@VERSION@|%{version}|" \
    %{_sourcedir}/module.flux-sched

%build
# PKG_CONFIG_PATH and PATH should come from flux-core module been
export MODULEPATH=/opt/modules/modulefiles
. /etc/profile.d/[mM]odules.sh
module load python/2.7 python-pycparser python-cffi mvapich2-gnu-shmem

# We want to make this a relocatable package
# We want to be able to install multiple flux-sched RPMs so we use
# ${name}-${version} install directory
./configure --prefix=/opt/%{name}-${version}
make %{?_smp_mflags}
export FLUX_TESTS_LOGFILE=t
make check
if [ $? -eq 0 ]; then
  cat t/*.out t/*.log 
  exit 1
fi

%install
# RPM_BUILD_ROOT comes from BuildRoot tag.
rm -rf ${RPM_BUILD_ROOT}
mkdir -p ${RPM_BUILD_ROOT}
make install DESTDIR=${RPM_BUILD_ROOT}
find ${RPM_BUILD_ROOT} -name *.la | while read f; do rm -f $f; done

install -D -m 555 %{_sourcedir}/module.flux-sched \
    ${RPM_BUILD_ROOT}/opt/modules/modulefiles/%{name}/%{version}

%clean
rm -rf $RPM_BUILD_ROOT

%files
%defattr(-,root,root,-)
%dir /opt/modules/modulefiles/%{name}
/opt/modules/modulefiles/%{name}/%{version}
/opt/%{name}-%{version}

%post

%changelog

* Tue Apr 26 2016 Dong H. Ahn <ahn1@llnl.gov> 0.1.0-1
- Build from initial flux-sched-0.1.0 tag
#%Module1.0
# vi:set filetype=tcl:
#

# Load prereqs
if { ![is-loaded python/2.7] } {
   module load python/2.7
}
if { ![is-loaded python-pycparser] } {
   module load python-pycparser
}
if { ![is-loaded python-cffi] } {
   module load python-cffi
}
if { ![is-loaded flux-core] } {
   module load flux-core
}

# global control file
if { [file exists $env(MODULESHOME)/etc/control] } {
   source $env(MODULESHOME)/etc/control
}

# local variables
set name    @NAME@
set version @VERSION@
set prefix  /opt/${name}-${version}

#
# sched currently does not have ${prefix}/bin
#
prepend-path    FLUX_LUA_PATH_PREPEND  "${prefix}/share/lua/5.1/?.lua"
prepend-path    FLUX_LUA_CPATH_PREPEND "${prefix}/lib64/lua/5.1/?.lua"
prepend-path    FLUX_EXEC_PATH_PREPEND "${prefix}/libexec/flux/cmd"
prepend-path    FLUX_RC_EXTRA          "${prefix}/etc/flux"
garlick commented 8 years ago

FYI flux-core-0.3.0 has been tagged and rpms built for TOSS2 and TOSS3. Apparently we made the TOSS2 production deadline only because of a security update that came in at the last minute. Anyway Trent says the new rpms should roll out on the test systems soon (he thought it might already be on hype though I don't see it.

dongahn commented 8 years ago

Great. I will have to work on some other things for now, so this doesn't block me. I will check back on hype in a day or two.

dongahn commented 8 years ago

FYI -- Trent has just pushed 0.3.0 to hype login node.

dongahn commented 8 years ago

I've been consumed by something else and I will be on an all-day meeting today. I will try to get to this Thu or Fri.

dongahn commented 8 years ago

Ok. With minor modifications/fixes to the spec file, a spec rpm build on TOSS2 on flux-core/0.3.0 rpm. I will take this to TOSS2 BuildBot as the next step.

Name: flux-sched
Version: 0.1.0
Release: 1%{?dist}
Summary: Job Scheduling Facility for Flux Resource Manager Framework
Group: System Environment/Base
License: GPLv2+ 
URL: https://github.com/flux-framework/flux-sched 
Source0: %{name}-%{version}.tar.gz
Source1: flux-sched.module
BuildRoot: %{_tmppath}/%{name}-%{version}-root-%(%{__id_u} -n)
#let's not build the debug package for now 
%define debug_package %{nil}
#only compress -- no stripping etc 
%define __spec_install_post /usr/lib/rpm/brp-compress || :

BuildRequires: flux-core >= 0.3.0
BuildRequires: zeromq4-devel >= 4.1.4
BuildRequires: czmq-devel >= 3.0.2
BuildRequires: json-c-devel
BuildRequires: lua-devel >= 5.1
BuildRequires: lua-posix
BuildRequires: hwloc-devel >= 1.4

Requires: flux-core >= 0.3.0

%description
flux-sched contains the job scheduling facility for the Flux Resource
Manager Framework. It consists of a Flux comms module that handles
all the functionality common to scheduling. The module has the ability
to load one or more scheduling sub-modules that provide specific
scheduling behavior.

%prep
%setup -n %{name}-%{version}

sed -i -e "s|@NAME@|%{name}|" -e "s|@VERSION@|%{version}|" \
    %{_sourcedir}/flux-sched.module

%build
# PKG_CONFIG_PATH and PATH should come from flux-core module been
export MODULEPATH=/opt/modules/modulefiles
. /etc/profile.d/[mM]odules.sh
module load python/2.7 python-pycparser python-cffi mvapich2-gnu-shmem
module load flux-core

# We want to make this a relocatable package
# We want to be able to install multiple flux-sched RPMs so we use
# ${name}-${version} install directory
./configure --prefix=/opt/%{name}-%{version}
make %{?_smp_mflags}
export FLUX_TESTS_LOGFILE=t
make check
if [ $? -ne 0 ]; then
  cat t/*.out t/*.log 
  exit 1
fi

%install
# RPM_BUILD_ROOT comes from BuildRoot tag.
rm -rf ${RPM_BUILD_ROOT}
mkdir -p ${RPM_BUILD_ROOT}
make install DESTDIR=${RPM_BUILD_ROOT}
find ${RPM_BUILD_ROOT} -name *.la | while read f; do rm -f $f; done

install -D -m 555 %{_sourcedir}/flux-sched.module \
    ${RPM_BUILD_ROOT}/opt/modules/modulefiles/%{name}/%{version}

%clean
rm -rf $RPM_BUILD_ROOT

%files
%defattr(-,root,root,-)
%dir /opt/modules/modulefiles/%{name}
/opt/modules/modulefiles/%{name}/%{version}
/opt/%{name}-%{version}

%post

%changelog

* Tue Apr 26 2016 Dong H. Ahn <ahn1@llnl.gov> 0.1.0-1
- Build from initial flux-sched-0.1.0 tag

RPM file: /nfs/tmp2/dahn/rpm/RPMS/x86_64/flux-sched-0.1.0-1.ch5.4.x86_64.rpm

garlick commented 8 years ago

@dongahn - you can likely drop the zeromq4 BuildRequires as (I believe) no libzmq interfaces are used directly in sched.

garlick commented 8 years ago

Also, I think module load flux-core brings in its dependencies so you shouldn't have to explicitly load them here.

Any BuildRequires needed to bring in the /opt module machinery? (If you are using @grondo's flux-core toss2 spec file as a guide then you probably are already doing the right thing.)

dongahn commented 8 years ago

Hmmmm that explict module load was needed to fix some early build error. i will check again.

dongahn commented 8 years ago

@garlick. I misread your comment. I think you are right (i was watching a musical where my son had some role :)

dongahn commented 8 years ago

FYI -- we talked about this a bit at the meeting. The current buildbot error:

DEBUG: ERROR: t0001-basic.t - missing test plan DEBUG: ERROR: t0001-basic.t - exited with status

was because a key wasn't generated.

I also moved the make check into the check section from the build section. In addition, as rpmbuild evaluates one spec line at at time, the cat t/*.out t/*.log given at the next lines wasn't working. Instead,

%check
export MODULEPATH=/opt/modules/modulefiles
. /etc/profile.d/[mM]odules.sh
module load flux-core
flux keygen
export FLUX_TESTS_LOGFILE=t
make check || (cat t/*.output t/*.log && exit 1)

With this, Buildbot printed out more meaningful debug data:

DEBUG: ERROR: t0001-basic
DEBUG: ==================
DEBUG: flux-broker: flux_sec_zauth_init: The directory '/builddir/.flux' \
               does not exist. Have you run `flux keygen`?
DEBUG: flux-broker: flux_sec_zauth_init: The directory '/builddir/.flux' \
              does not exist. Have you run `flux keygen`?
DEBUG: flux-start: 0 (pid 41206) exited with rc=1
DEBUG: flux-start: 1 (pid 41207) exited with rc=1
DEBUG: ERROR: t0001-basic.t - missing test plan
DEBUG: ERROR: t0001-basic.t - exited with status 1

With this fix, at least rpms are popped out. I will look through the Buildbot logs a bit more carefully and test them out before moving onto TOSS 3 this afternoon.

hype356{dahn}50: ll /repo/llnl/RHEL6/5.4/RPMS/x86_64 | grep flux-sched
-r--r--r-- 1 531 531     204732 May 16 12:01 flux-sched-0.1.0-1.ch5.4.x86_64.rpm
hype356{dahn}51: ll /repo/llnl/RHEL6/5.4/SRPMS/ | grep flux-sched
-r--r--r-- 1 531 531     859407 May 16 12:01 flux-sched-0.1.0-1.el6.src.rpm
dongahn commented 8 years ago

@grondo: I am having trouble in exporting FLUX_LUA_PATH_PREPEND and FLUX_LUA_CPATH_PREPEND from within the flux-sched module file, and this is mainly because of the LUA glob. (I might have fallen into an escape hell.)

So, I am wondering if you have been able to find a way to export LUA glob character (?) for TOSS 2 module.

Just to give you a context, Trent helped installed flux-sched rpm this morning on hype. But if I type in module load flux-sched on hype from my shell (tcsh):

hype356{dahn}113: module load flux-sched
/opt/flux-sched-0.1.0/lib64/lua/5.1/?.lua: No match.

# looks like this hangs after this! -- although it is not (I will explain this later.)

I looked at how module works a bit and i looks like this command uses /usr/bin/modulecmd underneath to convert the commands in the script like setenv and prepend-path into shell-specific environment variable commands/control. But unfortunately it seems its expansion rules are a bit inconsistent in particular in dealing w/ special characters like ?. For example,

If I don't escape ? in flux-sched.module like:
setenv  FLUX_LUA_PATH_PREPEND  /share/lua/5.1/?.lua

/usr/bin/modulecmd doesn't interpret/expand it and returns the string as-is to be evaluated by eval

setenv FLUX_LUA_PATH_PREPEND /opt/flux-sched-0.1.0/share/lua/5.1/?.lua
But when I do escape ? like \?, /usr/bin/modulecmd does interpret and gets rid of this escape:
setenv          FLUX_LUA_PATH_PREPEND  /share/lua/5.1/\?.lua

/usr/bin/modulecmd doesn't expand it and return the string as is.

setenv FLUX_LUA_PATH_PREPEND /opt/flux-sched-0.1.0/share/lua/5.1/?.lua
As far as I can make it so that
setenv FLUX_LUA_PATH_PREPEND /opt/flux-sched-0.1.0/share/lua/5.1/\?.lua

comes out, I think I can make this work. But I have not been successful to find a magic formula yet because of this inconsistency. (I also tried to escape the escape and etc but no avail.).

So If you happened to go throughout this before and found a solution, please let me know!

BTW, I initially thought module load flux-sched was dead hung. But it turned out, module command is an alias to:

module:      aliased to set _prompt="$prompt";set prompt="";eval `/usr/bin/modulecmd tcsh !*`; set _exit=$status; set prompt="$_prompt";unset _prompt;; /usr/bin/test 0 = $_exit;

and there seems to be a bug in tcsh's eval. Once eval failed, the rest of the shell commands including set prompt="$_prompt" isn't executed and this gives that look and feel of a 'unkillable' hang!

If you happen to test this, you can come out of this hang-like illusion simply by typing in set prompt="$_prompt" and enter.

I am not really happy w/ module today.

grondo commented 8 years ago

I will try to log in later and check on this. I'm afraid I usually forget to test tcsh and likely there are traps lurking in its idiosyncrasies. Have you tried with bash or ksh?

Also, If you are still using prepend-path for the Lua paths you'll have to specify the delimiter as ;, the default delimiter for for prepend-path is colon.

grondo commented 8 years ago

@dongahn -- ok, I logged on to hype and the Lua paths work for me in bash:

 grondo@hype356:~$ env | grep ^FLUX
 grondo@hype356:~$ module load flux-sched
 grondo@hype356:~$ env | grep ^FLUX
FLUX_LUA_PATH_PREPEND=/opt/flux-sched-0.1.0/share/lua/5.1/?.lua
FLUX_RC_EXTRA=/opt/flux-sched-0.1.0/etc/flux
FLUX_LUA_CPATH_PREPEND=/opt/flux-sched-0.1.0/lib64/lua/5.1/?.lua
FLUX_EXEC_PATH_PREPEND=/opt/flux-sched-0.1.0/libexec/flux/cmd
 grondo@hype356:~$ flux start
[1463451754.325170] broker.err[0]: rc1: flux-module: sched: not found in module search path
[1463451754.325619] broker.err[0]: Run level 1 Exited with non-zero status (rc=1)
[1463451754.336362] broker.err[0]: rc3: flux-module: cmb.rmmod[0] sched: No such file or directory
flux-start: 0 (pid 16025) exited with rc=1

The problem above is just a missing FLUX_MODULE_PATH

grondo@hype356:~$ export FLUX_MODULE_PATH=/opt/flux-sched-0.1.0/lib/flux/modules      
grondo@hype356:~$ flux start
grondo@hype356:~$ flux module list | grep sched
sched                 322570 CB98A7D   20  S  0
grondo@hype356:~$ flux submit /bin/echo hello
submit: Submitted jobid 1

grondo@hype356:~$ flux wreck ls
    ID NTASKS STATE                    START      RUNTIME    RANKS COMMAND
     1      1 complete   2016-05-16T19:25:16       0.011s        0 echo
grondo@hype356:~$ flux wreck attach -l 1
0: hello
dongahn commented 8 years ago

Yes I found I needed FLUX_MODULE_PATH. Were you able to repro tcsh issue?

grondo commented 8 years ago

I did reproduce your problem with tcsh. However, zsh also works -- might I suggest you switch to that shell? ;-)

Seriously, though I'm not sure why modulecmd output doesn't quote all setenv arguments it generates for tcsh. That seems like a bug, but when working with tcsh who knows? I wonder if you tried wrapping the Lua paths in quote characters if that would help.

dongahn commented 8 years ago

Also, my latest module script uses setenv for these two LUA env variables instead of prepend-path, as I should.

grondo commented 8 years ago

Won't setenv overwrite any existing LUA_PATH_PREPEND environment variables? Probably not an issue now, but what if we had two framework projects that each need to use LUA_PATH_PREPEND?

dongahn commented 8 years ago

Ah, you arr right. Will stick witn prepend w/ delimiter.

dongahn commented 8 years ago

The quote characters didnt work for me though i probably wasnt exhastive.

W/ Toss 2 going away soon, fixing this modulecmd bug isn't probably worth it... Perhaps I just make this work under other shells and move to toss 3?

garlick commented 8 years ago

Yeah, I was thinking the same thing - maybe open a bug and get the release out, and fix later if needed. ​

On Mon, May 16, 2016 at 8:12 PM, Dong H. Ahn notifications@github.com wrote:

The quote characters didnt work for me though i probably wasnt exhastive.

W/ Toss 2 going away soon, fixing this modulecmd bug isn't probably worth it... Perhaps I just make this work under other shells and move to toss 3?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/flux-framework/flux-sched/issues/154#issuecomment-219608789

grondo commented 8 years ago

I checked the source for environment-modules and here's where they attempt to escape csh strings:

  for(;*in;in++) {
    if (*in == ' ' ||
    *in == '\t'||
    *in == '\\'||
    *in == '{' ||
    *in == '}' ||
    *in == '|' ||
    *in == '<' ||
    *in == '>' ||
    *in == '!' ||
    *in == ';' ||
    *in == '#' ||
    *in == '$' ||
    *in == '^' ||
    *in == '&' ||
    *in == '*' ||
    *in == '\''||
    *in == '"' ||
    *in == '(' ||
    *in == ')') {
      *out++ = '\\';
    }
    *out++ = *in;

Sadly, they seem to have forgotten literal ?, and given that they escape single-quotes, it leaves us no way to actually quote the entire LUA_PATH string or ? ourselves. (Why didn't they just give an option to quote the whole string??)

I share @dongahn's frustration with environment-modules.

I think our only option is to patch the environment-modules package. I've already verified that the following patch will allow us to use ? in setenv for csh style shells:

diff --git a/utility.c b/utility.c
index 4f1c2e7..c0ea520 100644
--- a/utility.c
+++ b/utility.c
@@ -2752,6 +2752,7 @@ void EscapeCshString(const char* in,
        *in == '^' ||
        *in == '&' ||
        *in == '*' ||
+       *in == '?' ||
        *in == '\''||
        *in == '"' ||
        *in == '(' ||

That is instead of

$ modulecmd tcsh load test
setenv FLUX_LUA_PATH_PREPEND /opt/flux-sched-0.1.0/share/lua/5.1/?.lua ;setenv LOADEDMODULES test/0.1.0 ;

The patched version generates

$ ./modulecmd tcsh load test
setenv FLUX_LUA_PATH_PREPEND /opt/flux-sched-0.1.0/share/lua/5.1/\?.lua ;setenv LOADEDMODULES test/0.1.0 ;
dongahn commented 8 years ago

@grondo: Thank you for getting to the bottom of the issue! This agrees with what I observed yesterday.

Given that the current module will be replaced with Lmod in TOSS3 for tce packaging, I'm not sure if we have someone who is willing to accept your patch. I will talk to Trent though. At least they claim the new Lmod-based module can work with an original module file as-is, we should try this case under TOSS3 to see Lmod doesn't have this escape bug. Adding @lee218llnl and @adammoody to give a heads-up.

I was pretty frustrated w/ both module and tcsh idiosynchrosy yesterday (e.g., eval failure led to that unkillable process look and feel) :-( I hope that the new Lmod-based module will do a better job...

Good news is we won't do module-based packing for flux on TOSS3!

grondo commented 8 years ago

We already run a patched version of environment-modules for TOSS2, so I already have a new version ready to go if it makes sense.

grondo commented 8 years ago

And agreed, we should make sure lmod doesn't have this issue.

grondo commented 8 years ago

@dongahn, I've built a new environment-modules-3.2.10-1.2chaos package for TOSS2 with the above patch applied.

dongahn commented 8 years ago

Great. Thanks!

dongahn commented 8 years ago

FYI -- I have new rpm builds for TOSS2. I will test this once this and environment-modules-3.2.10-1.2chaos package will be rolled out.

flux-sched /opt RPM for RHEL6.

/repo/llnl/RHEL6/5.4/RPMS/x86_64/flux-sched-0.1.0-2.ch5.4.x86_64.rpm
/repo/llnl/RHEL6/5.4/SRPMS/flux-sched-0.1.0-2.el6.src.rpm

RPMs have been signed with the follwing GPG key:
--------
pub   1024D/D8A1F5EF 2007-09-28
      Key fingerprint = 2A4A B485 561B 797F 5EFA  E0D8 5BFF 971C D8A1 F5EF
uid                  CR/LF Builder <chaos-dev@lists.llnl.gov>
sub   2048g/3D3BAB67 2007-09-28
--------

Sincerely, The Builder.
dongahn commented 8 years ago

Moving on to koji, it seems tosspkg isn't really happy at the moment. Is it only me?

opal186{dahn}30: tosspkg clone examplepkg

hangs.

@garlick: did you post your flux-core.spec somewhere. It would be good if I can compare my adjustments for koji w/ yours.

lee218llnl commented 8 years ago

Dong, you may need to delete your cached credentials in ~/.gitconfig


From: Dong H. Ahn Sent: Saturday, May 21, 2016 7:27:12 AM To: flux-framework/flux-sched Cc: Lee, Greg; Mention Subject: Re: [flux-framework/flux-sched] Questions that may arise in creating the first sched RPM package (#154)

Moving on to koji, it seems tosspkg isn't really happy at the moment. Is it only me?

opal186{dahn}30: tosspkg clone examplepkg

hangs.

@garlickhttps://github.com/garlick: did you post your flux-core.spec somewhere. It would be good if I can compare my adjustments for koji w/ yours.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHubhttps://github.com/flux-framework/flux-sched/issues/154#issuecomment-220780638

garlick commented 8 years ago

@dongahn maybe you can tosspkg clone flux-core? If not, koji web has a list of packages and I think you can drill down to the spec file that way. If neither of those work I will get it for you but it may be later today before I can get to vpn

garlick commented 8 years ago

flux-framework/distribution#4 may have some useful info also, though I apparently didn't post spec there.

dongahn commented 8 years ago

@lee218llnl: Great! This fixed the issue and now I can see the flux-core rpm!

dongahn commented 8 years ago

It seem I hit a permission issue with koji packaging on opal. I've sent an email to @foraker on this:

opal186{dahn}35: koji add-pkg ch6-chaotic flux-sched --owner=dahn
ActionNotAllowed: policy violation (package_list)

I will be on travel next week. But since this should be simple enough, I will try to complete this next week. My new untested spec file and Makefile:

Name: flux-sched
Version: 0.1.0
Release: 3%{?dist}
Summary: Job Scheduling Facility for Flux Resource Manager Framework
Group: System Environment/Base
License: GPLv2+ 
URL: https://github.com/flux-framework/flux-sched 
Source0: %{name}-%{version}.tar.gz
BuildRoot: %{_tmppath}/%{name}-%{version}-root-%(%{__id_u} -n)
#let's not build the debug package for now 
%define debug_package %{nil}
#only compress -- no stripping etc 
%define __spec_install_post /usr/lib/rpm/brp-compress || :

BuildRequires: flux-core >= 0.3.0
BuildRequires: czmq-devel >= 3.0.2
BuildRequires: json-c-devel
BuildRequires: lua-devel >= 5.1
BuildRequires: lua-posix
BuildRequires: hwloc-devel >= 1.4
BuildRequires: libuuid-devel

Requires: flux-core >= 0.3.0
Requires: libuuid

%description
flux-sched contains the job scheduling facility for the Flux Resource
Manager Framework. It consists of a Flux comms module that handles
all the functionality common to scheduling. The module has the ability
to load one or more scheduling sub-modules that provide specific
scheduling behavior.

%prep
%setup -n %{name}-%{version}

%build
./configure 
make %{?_smp_mflags}

%check
flux keygen
export FLUX_TESTS_LOGFILE=t
make check || (cat t/*.output t/*.log && exit 1)

%install
# RPM_BUILD_ROOT comes from BuildRoot tag.
rm -rf ${RPM_BUILD_ROOT}
mkdir -p ${RPM_BUILD_ROOT}
make install DESTDIR=${RPM_BUILD_ROOT}
find ${RPM_BUILD_ROOT} -name *.la | while read f; do rm -f $f; done

%clean
rm -rf $RPM_BUILD_ROOT

%files
%defattr(-,root,root,-)
%{_sysconfdir}/flux/rc1.d/sched-start
%{_sysconfdir}/flux/rc3.d/sched-stop
%{_includedir}/flux/sched
%{_libdir}/flux/modules/sched
%{_libdir}/libflux-rdl.so*
%{_libdir}/lua/5.1/flux/cpuset.so
%{_libexecdir}/flux/cmd/flux-rdltool
%{_libexecdir}/flux/cmd/flux-waitjob
%{_datadir}/lua/5.1/middleclass.lua
%{_datadir}/lua/5.1/RDL.lua
%{_datadir}/lua/5.1/RDL
%post

%changelog

* Sun May 22 2016 Dong H. Ahn <ahn1@llnl.gov> 0.1.0-3
- Adjustment for TOSS3 deployment 
      Remove the use of environmental modules
      Adjust for installing into default system directories.
* Thu May 19 2016 Dong H. Ahn <ahn1@llnl.gov> 0.1.0-2
- Minor adjustment to flux-sched.module
* Sun May 15 2016 Dong H. Ahn <ahn1@llnl.gov> 0.1.0-1
- Build from initial flux-sched-0.1.0 tag
TAG := 0.1.0
TARBALL:= flux-sched-$(TAG).tar.gz
URL := https://github.com/flux-framework/flux-sched/releases/download/$(TAG)/$(TARBALL)

sources:
    rm -f $(TARBALL)
    wget $(URL)

clean:
    rm -f $(TARBALL)

.PHONY: sources
.PHONY: clean
dongahn commented 8 years ago

OK, now I also have a TOSS3 sched rpm to test:

Package: flux-sched-0.1.0-3.ch6
Tag: ch6-chaotic
Status: complete
Built by: dahn
ID: 862
Started: Mon, 23 May 2016 19:55:57 PDT
Finished: Mon, 23 May 2016 19:57:29 PDT
Changelog:
* Mon May 23 2016 Dong H. Ahn <ahn1@llnl.gov> 0.1.0-3
- Adjustment for TOSS3 deployment
      Remove the use of environmental modules
      Adjust for installing into default system directories.

* Thu May 19 2016 Dong H. Ahn <ahn1@llnl.gov> 0.1.0-2
- Minor adjustment to flux-sched.module

* Sun May 15 2016 Dong H. Ahn <ahn1@llnl.gov> 0.1.0-1
- Build from initial flux-sched-0.1.0 tag

SRPMS:
  flux-sched-0.1.0-3.ch6.src.rpm

Closed tasks:
-------------

Task 11233 on builder2-x86.buildfarm
Task Type: tagBuild (noarch)

Task 11229 on builder1-x86.buildfarm
Task Type: build (ch6-chaotic, /buildfarm/flux-sched:7effc332bbea64af54a6cf8c33c25b497e951c36)

Task 11230 on builder2-x86.buildfarm
Task Type: buildSRPMFromSCM (/buildfarm/flux-sched:7effc332bbea64af54a6cf8c33c25b497e951c36)
logs:
  http://tossbuild.llnl.gov/koji/getfile?taskID=11230&name=build.log
  http://tossbuild.llnl.gov/koji/getfile?taskID=11230&name=checkout.log
  http://tossbuild.llnl.gov/koji/getfile?taskID=11230&name=mock_output.log
  http://tossbuild.llnl.gov/koji/getfile?taskID=11230&name=root.log
  http://tossbuild.llnl.gov/koji/getfile?taskID=11230&name=state.log

Task 11231 on builder2-x86.buildfarm
Task Type: buildArch (flux-sched-0.1.0-3.ch6.src.rpm, x86_64)
logs:
  http://tossbuild.llnl.gov/koji/getfile?taskID=11231&name=build.log
  http://tossbuild.llnl.gov/koji/getfile?taskID=11231&name=mock_output.log
  http://tossbuild.llnl.gov/koji/getfile?taskID=11231&name=root.log
  http://tossbuild.llnl.gov/koji/getfile?taskID=11231&name=state.log
rpms:
https://tossbuild.llnl.gov/kojifiles/packages/flux-sched/0.1.0/3.ch6/x86_64/flux-sched-0.1.0-3.ch6.x86_64.rpm

Task Info: http://tossbuild.llnl.gov/koji/taskinfo?taskID=11229
Build Info: http://tossbuild.llnl.gov/koji/buildinfo?buildID=862
dongahn commented 8 years ago

Note as a future reference:

  1. To add a new system-level package, Foraker told me the protocol is to submit a TOSS JIRA request (though he was kind enough to do this for me.)
  2. The final problem I had to address was:
RPM build errors:
error: File not found: /builddir/build/BUILDROOT/flux-sched-0.1.0-3.ch6.x86_64/etc/flux/rc1.d/sched-start
error: File not found: /builddir/build/BUILDROOT/flux-sched-0.1.0-3.ch6.x86_64/etc/flux/rc3.d/sched-stop
error: File not found: /builddir/build/BUILDROOT/flux-sched-0.1.0-3.ch6.x86_64/usr/include/flux/sched
error: File not found: /builddir/build/BUILDROOT/flux-sched-0.1.0-3.ch6.x86_64/usr/lib64/flux/modules/sched
error: File not found by glob: /builddir/build/BUILDROOT/flux-sched-0.1.0-3.ch6.x86_64/usr/lib64/libflux-rdl.so*
error: File not found: /builddir/build/BUILDROOT/flux-sched-0.1.0-3.ch6.x86_64/usr/lib64/lua/5.1/flux/cpuset.so
error: File not found: /builddir/build/BUILDROOT/flux-sched-0.1.0-3.ch6.x86_64/usr/libexec/flux/cmd/flux-rdltool
error: File not found: /builddir/build/BUILDROOT/flux-sched-0.1.0-3.ch6.x86_64/usr/libexec/flux/cmd/flux-waitjob

Apparently, ./configure in my spec file was installing scheds into /usr/local area, which is the default, for example,

make[2]: Entering directory `/builddir/build/BUILD/flux-sched-0.1.0/etc'
make[2]: Nothing to be done for `install-exec-am'.
 /usr/bin/mkdir -p '/builddir/build/BUILDROOT/flux-sched-0.1.0-3.ch6.x86_64/usr/local/etc/flux/rc1.d'
 /usr/bin/install -c sched-start '/builddir/build/BUILDROOT/flux-sched-0.1.0-3.ch6.x86_64/usr/local/etc/flux/rc1.d'
 /usr/bin/mkdir -p '/builddir/build/BUILDROOT/flux-sched-0.1.0-3.ch6.x86_64/usr/local/etc/flux/rc3.d'
 /usr/bin/install -c sched-stop '/builddir/build/BUILDROOT/flux-sched-0.1.0-3.ch6.x86_64/usr/local/etc/flux/rc3.d'

I realized I had to change ./configure to %configure so that koji can pass in the correct system install path!

lipari commented 8 years ago

Nice work! And thanks for highlighting your battle scars.

dongahn commented 8 years ago

FYI -- I checked opal (TOSS3) just before it was powered down. flux-sched has been deployed and seems working now.

opal186{dahn}: rpm -qa | grep flux
flux-sched-0.1.0-3.ch6.x86_64
flux-core-0.3.0-1.ch6.x86_64

cab86{dahn}: rpm -ql flux-sched-0.1.0-3.ch6.x86_64
/etc/flux/rc1.d
/etc/flux/rc1.d/sched-start
/etc/flux/rc3.d
/etc/flux/rc3.d/sched-stop
/usr/include/flux/sched
/usr/include/flux/sched/rdl.h
/usr/lib64/flux/modules/sched
/usr/lib64/flux/modules/sched.so
/usr/lib64/flux/modules/sched/sched_backfill.so
/usr/lib64/flux/modules/sched/sched_fcfs.so
/usr/lib64/libflux-rdl.so
/usr/lib64/libflux-rdl.so.0
/usr/lib64/libflux-rdl.so.0.0.0
/usr/lib64/lua/5.1/flux/cpuset.so
/usr/libexec/flux/cmd/flux-rdltool
/usr/libexec/flux/cmd/flux-waitjob
/usr/share/lua/5.1/RDL
/usr/share/lua/5.1/RDL.lua
/usr/share/lua/5.1/RDL/Resource.lua
/usr/share/lua/5.1/RDL/ResourceData.lua
/usr/share/lua/5.1/RDL/lib
/usr/share/lua/5.1/RDL/lib/ListOf.lua
/usr/share/lua/5.1/RDL/memstore.lua
/usr/share/lua/5.1/RDL/serialize.lua
/usr/share/lua/5.1/RDL/types
/usr/share/lua/5.1/RDL/types/Node.lua
/usr/share/lua/5.1/RDL/types/Socket.lua
/usr/share/lua/5.1/RDL/uri.lua
/usr/share/lua/5.1/RDL/uuid.lua
/usr/share/lua/5.1/middleclass.lua

opal186{dahn}: flux start -s 2
opal186{dahn}: flux module list
Module               Size    Digest  Idle  S  Nodeset
kvs                  3165080 B31A075    0  S  0
content-sqlite       2991661 8508D48   12  S  0
resource-hwloc       2998996 EA42108   12  S  0
wrexec               2977729 D65B456   12  S  0
connector-local      3008762 179740B    0  R  0
sched                 383793 EA31F73   12  S  0
mecho                2965184 D65AA74   12  S  0
job                  2996487 6037479   12  S  0
barrier              2988886 D9E5F0E   12  S  0

opal186{dahn}: exit
opal186{dahn}: setenv FLUX_SCHED_RC_NOOP 1
opal186{dahn}: 
opal186{dahn}: 
opal186{dahn}: flux start -s 2
opal186{dahn}: flux module list
Module               Size    Digest  Idle  S  Nodeset
kvs                  3165080 B31A075    0  S  0
content-sqlite       2991661 8508D48    1  S  0
resource-hwloc       2998996 EA42108    1  S  0
wrexec               2977729 D65B456    1  S  0
connector-local      3008762 179740B    0  R  0
mecho                2965184 D65AA74    1  S  0
job                  2996487 6037479    1  S  0
barrier              2988886 D9E5F0E    1  S  0

opal186{dahn}: unsetenv FLUX_SCHED_RC_NOOP
opal186{dahn}: flux -v start -s3
FLUX_CONF_DIRECTORY=/g/g0/dahn/.flux
LUA_PATH=/usr/share/lua/5.1/?.lua;;;
FLUX_EXEC_PATH=/usr/libexec/flux/cmd
FLUX_SEC_DIRECTORY=/g/g0/dahn/.flux
PYTHONPATH=/usr/lib64/python2.7/site-packages
LUA_CPATH=/usr/lib64/lua/5.1/?.so;;;
FLUX_CONNECTOR_PATH=/usr/lib64/flux/connectors
FLUX_MODULE_PATH=/usr/lib64/flux/modules
sub-command search path: /usr/libexec/flux/cmd
flux: trying to exec /usr/libexec/flux/cmd/flux-start
opal186{dahn}21: flux submit -N3 -n3 hostname
submit: Submitted jobid 1
opal186{dahn}22: flux kvs dir lwj.1
lwj.1.state = complete
lwj.1.cmdline = [ "hostname" ]
lwj.1.nnodes = 3
lwj.1.input.
lwj.1.walltime = 0
lwj.1.cwd = /g/g0/dahn
lwj.1.ntasks = 3
lwj.1.rdl = <CUT>
dongahn commented 7 years ago

Hmmm on TOSS2, there seems to be an error in the new module script. Will revisit this.

hype356{dahn}27: module load flux-sched
cmdPath.c(159):ERROR:11: Usage is 'prepend-path path-variable directory'
flux-sched/0.1.0(32):ERROR:102: Tcl command execution failed: prepend-path  --delim=; FLUX_LUA_PATH_PREPEND  ${prefix}/share/lua/5.1/?.lua
grondo commented 7 years ago

@dongahn: You may have to quote the ; argument to --delim, i.e. try --delim ";"

dongahn commented 7 years ago

Thanks @grondo. Yes the problem was along that line. It seems escaping ; is the way to go. (Just as a future reference quoting it gave me the same error). I found a usage page at sourceforge, ... hopefully this is the right page :-(

prepend-path [ -d C | --delim C | --delim=C ] variable value

Append or prepend value to environment variable. The variable is a colon, or delimiter, separated list such as 
"PATH=directory:directory:directory". The default delimiter is a colon ':', but an arbitrary one can be given by the --delim option. For example a space can be used instead (which will need to be handled in the Tcl specially by enclosing it in " " or { }). A space, however, can not be specified by the --delim=C form.

I tested this by adding the local path where I have the module file to MODULEPATH:

hype356{dahn}42: setenv MODULEPATH /nfs/tmp2/dahn/lc_rpm/chaos5.4/flux-sched:/usr/share/Modules/modulefiles:/etc/modulefiles:/opt/modules/modulefiles

Then,

hype356{dahn}47: module load flux-sched.module
/opt/flux-sched-0.1.0/lib64/lua/5.1/?.lua: No match.

Got back to that old state -- the environment module bug disallowing escaping ?

I will create a new sched package with this fix. Hopefully, once your bug fix version of environment module rpm (environment-modules-3.2.10-1.2chaos) is installed, everything works seamlessly.

The turn-around time of this is a bit bothering me though, but oh well.