Closed nickjer closed 7 years ago
It seems you display "null" for Account if it is nil
It seems you display "null" for Account if it is nil
I feel like this should be a separate issue to resolve, but I'll go ahead and add the fix to this branch since it's the first time it appears.
When you get a chance, can you post an example of a job (queued or running) on multiple nodes?
From the docs:
The job submitter’s and reservation creator’s primary group is automatically added to the job or reservation group_list attribute.
I think it's probably safe to assume that group_list
is the equivalent of account_id
Multiple node jobs.
Queued
=> #<OodCore::Job::Info:0x0000000f922c00
@accounting_id=nil,
@allocated_nodes=[],
@cpu_time=nil,
@dispatch_time=nil,
@id="723823.head1.cm.cluster",
@job_name="get_refseq.sh",
@job_owner="jamesthornton",
@native=
{:job_id=>"723823.head1.cm.cluster",
:Job_Name=>"get_refseq.sh",
:Job_Owner=>"jamesthornton@login3.cm.cluster",
:job_state=>"Q",
:queue=>"oc_standard",
:server=>"head1.cm.cluster",
:Checkpoint=>"u",
:ctime=>"Thu Jun 29 11:30:48 2017",
:Error_Path=>"login3.cm.cluster:/rsgrps/bhurwitz/jetjr/get_refseq.sh.e723823",
:group_list=>"bhurwitz",
:Hold_Types=>"n",
:Join_Path=>"n",
:Keep_Files=>"n",
:Mail_Points=>"bea",
:Mail_Users=>"jamesthornton@email.arizona.edu",
:mtime=>"Thu Jun 29 11:30:48 2017",
:Output_Path=>"login3.cm.cluster:/rsgrps/bhurwitz/jetjr/get_refseq.sh.o723823",
:Priority=>"0",
:qtime=>"Thu Jun 29 11:30:48 2017",
:Rerunable=>"True",
:Resource_List=>
{:cput=>"24:00:00",
:mem=>"30gb",
:mpiprocs=>"8",
:ncpus=>"8",
:nodect=>"2",
:place=>"free",
:pvmem=>"26gb",
:select=>"2:ncpus=4:mem=15gb:pcmem=6gb:nodetype=standard:mpiprocs=4",
:walltime=>"24:00:00"},
:substate=>"10",
:Variable_List=>
"PBS_O_SYSTEM=Linux,PBS_O_SHELL=/bin/bash,PBS_O_HOME=/home/u1/jamesthornton,PBS_O_LOGNAME=jamesthornton,PBS_O_WORKDIR=/rsgrps/bhurwitz/jetjr,PBS_O_LANG=en_US.UTF-8,PBS_O_PATH=/rsgrps/bh_class/anvio/bin:/rsgrps/bhurwitz/hurwitzlab/bin:/rsgrps/bh_class/bin:/home/u1/jamesthornton/perl5/bin:/cm/local/apps/gcc/5.2.0/bin:/cm/shared/apps/pbspro/13.0.2.153173/sbin:/cm/shared/apps/pbspro/13.0.2.153173/bin:/cm/shared/uabin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/sbin:/usr/sbin:/cm/local/apps/environment-modules/3.2.10/bin:/home/u1/jamesthornton/perl5/bin:/rsgrps/bhurwitz/hurwitzlab/rakudobrew/bin/:/home/u1/jamesthornton/bin,PBS_O_MAIL=/var/spool/mail/jamesthornton,PBS_O_QUEUE=standard,PBS_O_HOST=login3.cm.cluster",
:etime=>"Thu Jun 29 11:30:48 2017",
:Submit_arguments=>"get_refseq.sh",
:project=>"_pbs_project_default"},
@procs=0,
@queue_name="oc_standard",
@status=#<OodCore::Job::Status:0x0000000f922b60 @state=:queued>,
@submission_time=2017-06-29 11:30:48 -0700,
@submit_host="login3.cm.cluster",
@wallclock_limit=86400,
@wallclock_time=nil>]
Running
=> #<OodCore::Job::Info:0x000000020bc280
@accounting_id=nil,
@allocated_nodes=
[#<OodCore::Job::NodeInfo:0x000000020bc118 @name="i7n2", @procs=28>,
#<OodCore::Job::NodeInfo:0x000000020bc0a0 @name="i7n6", @procs=28>,
#<OodCore::Job::NodeInfo:0x000000020bc028 @name="i7n12", @procs=28>,
#<OodCore::Job::NodeInfo:0x00000000ebaed8 @name="i7n13", @procs=28>,
#<OodCore::Job::NodeInfo:0x00000000ebad20 @name="i7n20", @procs=28>,
#<OodCore::Job::NodeInfo:0x00000000ebaa78 @name="i7n22", @procs=28>,
#<OodCore::Job::NodeInfo:0x00000000eba910 @name="i8n5", @procs=28>,
#<OodCore::Job::NodeInfo:0x00000000eba780 @name="i8n15", @procs=28>,
#<OodCore::Job::NodeInfo:0x00000000eba6e0 @name="i11n6", @procs=28>,
#<OodCore::Job::NodeInfo:0x00000000eba5c8 @name="i11n15", @procs=28>],
@cpu_time=4086429,
@dispatch_time=2017-06-29 07:26:02 -0700,
@id="723689.head1.cm.cluster",
@job_name="oiiinoph0311",
@job_owner="jspilker",
@native=
{:job_id=>"723689.head1.cm.cluster",
:Job_Name=>"oiiinoph0311",
:Job_Owner=>"jspilker@login2.cm.cluster",
:resources_used=>
{:cpupercent=>"27939", :cput=>"1135:07:09", :mem=>"376728480kb", :ncpus=>"280", :vmem=>"54816368kb", :walltime=>"04:06:41"},
:job_state=>"R",
:queue=>"oc_windfall",
:server=>"head1.cm.cluster",
:Checkpoint=>"u",
:ctime=>"Thu Jun 29 07:20:26 2017",
:Error_Path=>"login2.cm.cluster:/home/u4/jspilker/MIG2016/jspilker/Ripples_noph/oiiinoph0311.e723689",
:exec_host=>"i7n2/0*28+i7n6/0*28+i7n12/0*28+i7n13/0*28+i7n20/0*28+i7n22/0*28+i8n5/0*28+i8n15/0*28+i11n6/0*28+i11n15/0*28",
:exec_vnode=>
"(i7n2:ncpus=28:mem=176160768kb)+(i7n6:ncpus=28:mem=176160768kb)+(i7n12:ncpus=28:mem=176160768kb)+(i7n13:ncpus=28:mem=176160768kb)+(i7n20:ncpus=28:mem=176160768kb)+(i7n22:ncpus=28:mem=176160768kb)+(i8n5:ncpus=28:mem=176160768kb)+(i8n15:ncpus=28:mem=176160768kb)+(i11n6:ncpus=28:mem=176160768kb)+(i11n15:ncpus=28:mem=176160768kb)",
:group_list=>"dmarrone",
:Hold_Types=>"n",
:Join_Path=>"n",
:Keep_Files=>"n",
:Mail_Points=>"a",
:mtime=>"Thu Jun 29 07:26:27 2017",
:Output_Path=>"login2.cm.cluster:/home/u4/jspilker/MIG2016/jspilker/Ripples_noph/oiiinoph0311.o723689",
:Priority=>"0",
:qtime=>"Thu Jun 29 07:20:26 2017",
:Rerunable=>"True",
:Resource_List=>
{:cput=>"2688:00:00",
:mem=>"1680gb",
:mpiprocs=>"280",
:ncpus=>"280",
:nodect=>"10",
:place=>"free",
:select=>"10:ncpus=28:mem=168gb:mpiprocs=28:pcmem=6gb:nodetype=standard",
:walltime=>"50:00:00"},
:stime=>"Thu Jun 29 07:26:02 2017",
:session_id=>"6309",
:jobdir=>"/home/u4/jspilker",
:substate=>"42",
:Variable_List=>
"PBS_O_SYSTEM=Linux,PBS_O_SHELL=/bin/bash,PBS_O_HOME=/home/u4/jspilker,PBS_O_LOGNAME=jspilker,PBS_O_WORKDIR=/home/u4/jspilker/MIG2016/jspilker/Ripples_noph,PBS_O_LANG=en_US.UTF-8,PBS_O_PATH=/usr/bin:/home/u4/jspilker/.local/bin:/cm/shared/uaapps/openmpi/gcc/1.10.2/bin:/cm/shared/uaapps/gsl/2.1/bin:/cm/shared/apps/intel/compilers_and_libraries/2016.4.258/mkl/bin:/home/u4/jspilker/.local/bin:/cm/local/apps/gcc/5.2.0/bin:/cm/shared/apps/pbspro/13.0.2.153173/sbin:/cm/shared/apps/pbspro/13.0.2.153173/bin:/cm/shared/uabin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/sbin:/usr/sbin:/cm/local/apps/environment-modules/3.2.10/bin:/home/u4/jspilker/bin,PBS_O_MAIL=/var/spool/mail/jspilker,PBS_O_QUEUE=windfall,PBS_O_HOST=login2.cm.cluster",
:comment=>
"Job run at Thu Jun 29 at 07:26 on (i7n2:ncpus=28:mem=176160768kb)+(i7n6:ncpus=28:mem=176160768kb)+(i7n12:ncpus=28:mem=176160768kb)+(i7n13:ncpus=28:mem=176160768kb)+(i7n20:ncpus=28:mem=176160768kb)+(i7n22:ncpus=28:mem=176160768kb)+(i8n5:ncpus=28:mem=176...",
:etime=>"Thu Jun 29 07:20:26 2017",
:run_count=>"1",
:Submit_arguments=>"pbs0311oiiicont.sh",
:project=>"_pbs_project_default"},
@procs=280,
@queue_name="oc_windfall",
@status=#<OodCore::Job::Status:0x000000020bc1e0 @state=:running>,
@submission_time=2017-06-29 07:20:26 -0700,
@submit_host="login2.cm.cluster",
@wallclock_limit=180000,
@wallclock_time=14801>
Group list and Account should be separate.
The job submitter’s and reservation creator’s primary group is automatically added to the job or reservation group_list attribute.
That means that the primary group is added to the group list attribute. It says nothing about "account". The group list affects what group the processes run as, and consequently, the group ownership of the files created by those processes.
This is very different than an account string, which may or may not match a group name.
Implemented by #128
The adapter is
pbspro
.Warning, treat everything as possibly being
nil
in theInfo
object includingaccounting_id
andwallclock_limit
.Interesting attributes to add to extended info:
group_list
(this seems important at least for Arizona's deployment of PBS Pro, similar toaccount
but not quite as PBS Pro does have anaccount
field)comment
(has useful info on why job is held)Resource_List => select
is similar to Torque'snode=...
Queued job examples:
Running job examples:
Held job example:
Suspended job example:
None so far...