Open nevali opened 7 years ago
Sketched interface to be implemented as part of libtwine
to support this functionality:
typedef /*opaque*/ struct twine_job_struct TWINEJOB;
typedef enum
{
TJS_WAITING,
TJS_ACTIVE,
TJS_ABORTED,
TJS_COMPLETED,
TJS_FAILED,
TJS_ERRORS
} TWINEJOBSTATUS;
typedef enum
{
TJP_PRESERVE,
TJP_FORCE
} TWINEJOBPARENTAGE;
/* This is a relatively low-level libtwine API: the only side-effects are limited to
* twine_job_create() creating or updating rows depending upon the parentage
* mode of the current parent job and whether a row for that UUIS exists or not.
*/
TWINEJOB *twine_job_create(const uuid_t uuid, const char *restrict uri, CLUSTER *restrict /*optional*/ cluster);
int twine_job_close(TWINEJOB *job);
const char *twine_job_uristr(TWINEJOB *job);
int twine_job_set_uristr(TWINEJOB *restrict job, const char *restrict uri);
/* NB: possibly require URI and librdf_uri variants of the above */
int twine_job_set_parentage(TWINEJOB *job, TWINEJOBPARENTAGE mode);
int twine_job_update(TWINEJOB *restrict job, TWINEJOBSTATUS status, const char *restrict /*optional*/ annotation);
int twine_job_set_progress(TWINEJOB *job, int /*optional*/ current, int /*optional*/ total);
/* NB: twine_job_set_progress() uses -1 as a sentinel to indicate NULL integer values;
* these will cause the job status to be left unchanged: twine_job_set_progress(job, -1, -1);
* is therefore a no-op
*/
Arguably the core state-tracking mechanism of this should be moved to bbcarchdev/libcluster itself, and Twine simply employs it.
Optionally support a
libsql
connection URI which will be used to track jobs as they are processed bytwine-writerd
ortwine-cli
.A job consists of:
urn:uuid:
representation of the job UUID, if nothing else is suitable, otherwise it'll be the canonical source or target URI, depending upon the processing pipeline; workflow components may update it accordingly during processing)WAITING
,ACTIVE
,ABORTED
(by the user),COMPLETE
,FAILED
,ERRORS
(partial failure)x
ofy
progress indicators (particularly for bulk ingests from filesystem sources)UUIDs should be where possible taken from the source, if it incorporates one into its identification, or generated on-the-fly if this is not possible.
A job stack should be maintained internally to
libtwine
in order to track parent/child relationships, rather than requiring it to be made explicit.As an example, an ingest of N-Quads from a file, processing with
spindle-correlate
might yield the following:WAITING
with a newly-generated UUID and afile:///
URIACTIVE
, with progress set to 0 of number-of-graphsWAITING
, using the Spindle-generated UUID and URICOMPLETE
As
spindle-generate
later processes its queue of items, it performs the following:WAITING
using the Spindle-generated UUID and URI; if it already exists, its parentage is preserved (thus, if the job originated from an ingest as described above, the proxy-generation step maintains the parent-child relationship allowing for ready visualisationWith this arrangement, a small number of relatively simple SQL queries can result in progress tracking and volumetrics across a processing cluster.
Open question: how would Twine know when to preserve versus replace the parent of a job?
Perhaps it could be as simple as user action (i.e.,
twine-cli
) taking precedence over an on-going process — thus, a queue-driventwine-writerd
will only set the parent of a job if it's newly-created, whereastwine-cli
will always override it. Both would create an overarching job for their processing runs, whether that's from a file or a queue.Tracked as RESDATA-1279