Closed ghukill closed 5 years ago
Largely done in jobrerun
branch.
Before merge to dev
, looking at what options might be available for rerunning a Job and tweaking parameters. If new validations are run, or re-indexed, those parameters will "stick" for rerun (as all operate from job_details
). But Jobs have input filters, RITS settings, etc., that could all be updated when rerun. And in fact they can, if job_details
are manually updated.
But this needs a GUI.
The job_optional_processing.html
template is just about precisely the settings that would be included, and might be a great option for exposing these settings again. This is because, in many ways, it does not make sense to alter these parameters unless the Job is rerunning.
The only outlying scenario might be a very large job, that is part of a "pipeline". A user may not want to rerun the Job at that moment, but tweak settings in preparation for a pipeline rerun. In this scenario, it would be advantageous to have the parameters editable, with the understanding they would not take effect until a rerun.
The disadvantage to this, would be altering configurations without the Job reflecting them. If the pipeline is not rerun, then the parameters are incorrect and misleading.
Close to merge with dev
, but holding off until pathway established for static Jobs. Approach may, when uploads are used (which is common), to save payload_dir
somewhere else than /tmp
?
Works as-is, but if server reboots, static harvests looking for payload_dir
in /tmp
will be gone.
Static, addressed.
Preparing to merge to dev
.
Note: a form GUI not implemented for updating Job parameters -- job_details
-- but JSON editor available under "Job Parameters" that can be used in a pinch. If there is a need for that ability can add later, but this allows for advanced tweaking of Job params (which will be used in a rerun) if need be.
Closing.
This would introduce an interesting, and perhaps very central new function to Combine, the ability to re-run Jobs "in-place".
Currently, the data model encourages running of new Jobs to re-harvest from endpoint
foo
. But, this has the disadvantage of requiring users to re-configure input filters, field mapping, transformations, validations, etc. This would a dataflow mentality, where Jobs are more like setting up nodes in a pipeline.The components for this are largely in place:
Some thoughts:
One that merge lineage, that might be difficult. Imagine the following situation:
When would
j5
get fired? It's more complex than, "jx
was an input job forjy
, so firejx
beforejy
," but it seems like it's something that could be figured out for a single Job, and all it's downstream "lineage".This would provide means of setting up a "pipeline" like the following:
Four Jobs in total, with a handful of configurations. Then, when it's determined there should be an update from the OAI endpoint, from somewhere in that Job there would be a "re-run" or "re-run Job stream", something to that effect.