ewiger / gc3pie

Automatically exported from code.google.com/p/gc3pie
0 stars 0 forks source link

unify Job keys from different backends #78

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
currently Job's dictionary contains accounting information gathered from the 
backends in the same format as produced by the backend
Job prefixes aither 'arc_' or 'sge_' to those information specific of the 
backend

we agreed to bring this into a uniform information representation by 
standardizing the max common denominator from the different backends

*all* additional information will then be discarded

Original issue reported on code.google.com by sergio.m...@gmail.com on 17 Nov 2010 at 1:32

GoogleCodeExporter commented 9 years ago
Issue 27 has been merged into this issue.

Original comment by riccardo.murri@gmail.com on 18 Nov 2010 at 1:06

GoogleCodeExporter commented 9 years ago
lrms_jobid <str>
lrms_jobname <str> (not identical)
resource_name <str>
timestamp
stderr_filename
stdout_filename

gc3         sge         src

queue =>  qname =    queue

cores => sge_slots = arc_cpu_count

exitcode => exit_status/failed = exitcode/status

used_walltime(sec.) => ru_wallclock = used_walltime

used_cputime(sec.) => cpu = used_cputime

used_memory(kB) => maxvmem(B) = used_memory(kB)

SGE

- maxvmeme => float with M at the end
- cpu => float (in sec.)
- wallclock => int in sec.

ARC
- used_memory => int KiB
- used_cputime = int sec.
- used_walltime => int sec.

Original comment by sergio.m...@gmail.com on 17 Dec 2010 at 6:05

GoogleCodeExporter commented 9 years ago

Original comment by sergio.m...@gmail.com on 12 Jan 2011 at 10:18

GoogleCodeExporter commented 9 years ago

Original comment by riccardo.murri@gmail.com on 9 Feb 2011 at 9:47

GoogleCodeExporter commented 9 years ago

Original comment by riccardo.murri@gmail.com on 9 Mar 2011 at 3:10

GoogleCodeExporter commented 9 years ago

Original comment by riccardo.murri@gmail.com on 1 Jul 2011 at 2:31

GoogleCodeExporter commented 9 years ago
The ARC Grid-Manager collects this information from the executed jobs (via GNU 
Time):

    WallTime
    KernelTime
    UserTime
    CPUUsage
    MaxResidentMemory
    AverageResidentMemory
    Average
    TotalMemory
    AverageUnsharedMemory
    AverageUnsharedStack
    AverageSharedMemory
    PageSize
    MajorPageFaults
    MinorPageFaults
    Swaps

    ForcedSwitches
    WaitSwitches
    Inputs
    Outputs
    SocketReceived
    SocketSent
    Signals

Since this is all available via GNU `time`, the `shellcmd` backend can
use it as well and we could probably gather the same information from
SGE/PBS/LSF backends as well.

I suggest we take this list as a starting point; some values will not
be available on all platforms (e.g., `time` consistenly reports memory
as "0" on Linux kernels < 2.6.32).

Original comment by riccardo.murri@gmail.com on 20 Jun 2012 at 6:51

GoogleCodeExporter commented 9 years ago
in arclib (ARC0) a, arclib.Job is described by the following attributes:
['client_software', 'cluster', 'comment', 'completion_time', 'cpu_count', 
'erase_time', 'errors', 'execution_nodes', 'exitcode', 'gmlog', 'id', 
'job_name', 'mds_validfrom', 'mds_validto', 'owner', 'proxy_expire_time', 
'queue', 'queue_rank', 'requested_cpu_time', 'requested_wall_time', 
'rerunable', 'runtime_environments', 'sstderr', 'sstdin', 'sstdout', 'status', 
'submission_time', 'submission_ui', 'used_cpu_time', 'used_memory', 
'used_wall_time']

in ARC2 (I would skip ARC1) the arc.Job is described by the following 
attributes:
['Cluster', 'ComputingManagerEndTime', 'ComputingManagerExitCode', 
'ComputingManagerSubmissionTime', 'CreationTime', 'EndTime', 'ExecutionNode', 
'ExitCode', 'InterfaceName', 'JobDescriptionDocument', 'JobID', 
'LocalInputFiles', 'LocalOwner', 'LocalSubmissionTime', 'Name', 
'OtherMessages', 'Owner', 'ProxyExpirationTime', 'Queue', 
'RequestedApplicationEnvironment', 'RequestedSlots', 'RequestedTotalCPUTime', 
'RequestedTotalWallTime', 'StartTime', 'State', 'StdErr', 'StdIn', 'StdOut', 
'SubmissionClientName', 'SubmissionHost', 'SubmissionTime',  'UsedCPUType', 
'UsedMainMemory', 'UsedOSFamily', 'UsedPlatform', 'UsedTotalCPUTime', 
'UsedTotalWallTime', 'UserDomain', 'Validity', 'VirtualMachine', 
'WaitingPosition', 'WorkingAreaEraseTime' ]

this is what we have at our disposal when updating an arc.Job object, what the 
grid-manager collects for the usage records is something we cannot access (at 
least to my knowledge)

Sergio :)

Original comment by sergio.m...@gmail.com on 21 Jun 2012 at 10:45

GoogleCodeExporter commented 9 years ago
| what the grid-manager collects for the usage records is something we cannot
| access (at least to my knowledge)

Well, there's the contents of the "diag" file in the ".arc" directory.
(Although that might be rightfully considered an implementation detail
and changed in the future.)

Original comment by riccardo.murri@gmail.com on 21 Jun 2012 at 10:50

GoogleCodeExporter commented 9 years ago

Original comment by riccardo.murri@gmail.com on 10 Jul 2012 at 9:50

GoogleCodeExporter commented 9 years ago

Original comment by riccardo.murri@gmail.com on 17 Aug 2012 at 11:46

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2767.

After a job reaches the `TERMINATED` state, the following
attributes are also set.  The meaning and format of the attributes
is consistent across backends.

  `execution.duration`
    Time lapse from start to end of the job at the remote
    execution site, as a `gc3libs.quantity.Duration`:class: value.
    (This is also often referred to as the 'wall-clock time' or
    `walltime`:term: of the job.)

  `execution.max_used_memory`
    Maximum amount of RAM used during job execution, represented
    as a `gc3libs.quantity.Memory`:class: value.

  `execution.used_cpu_time`
    Total time (as a `gc3libs.quantity.Duration`:class: value) that the
    processors has been actively executing the job's code.

Backends may set other attributes as well; the only convention is that
the name of the attribute starts with the (lowercased) backend name.
For instance, the PbsLrms backend sets attributes `pbs_queue`,
`pbs_end_time`, etc.

The triple (submission time, actual start time, end time) would be a
very useful addition to this set of common attributes, but not all
backends provide this information. (PBS, SGE and ARC1 do; ARC0 does
not have the "actual start time").  What would be a good way to handle
this?

Original comment by riccardo.murri@gmail.com on 21 Sep 2012 at 10:22