Open hexylena opened 5 years ago
Found this comment:
we "just" need to find out why this happens. Seems that destinations / job-destination mappings are cached ..
Also odd is the None
id
in the log output
Mapped job to destination id: None
(on 1st submission) and Persisting job destination (destination id: None)
(on 2nd submission)With static job mapping the log output contains the names of the destinations.
OK. None
is unrelated: https://github.com/galaxyproject/galaxy/pull/9742
One tiny step further:
if I get it correct the JobHandler class loads the Job from the DB (?) which does not persists the resubmit attribute of the JobDestination, but restore it from the job_conf file. All this info is then used to construct a JobWrapper which now has lost the resubmit info...
With my current knowledge I see two options:
Intuitively 1 seems the easier solution (but I guess in reality its quite a lot of work).
For 2.: Currently the dynamic destination calculation is triggered via this route:
The last function sets the destination for the resubmission and triggers galaxy.jobs.mapper.cache_job_destination
which resets the cached JobDestination by the result of the dynamic destinations python function (i.e. setting a new runner, params, env and resubmit).
The Idea would be to split _handle_resubmit_definitions in two parts, such that the failure handling would just mark the job for resubmission and the calculation of the new destination happens when the job reenters the queue.
Interestingly the destination is already set in the JobHandler when it enters the queue for the first time.
And maybe another wee step https://github.com/galaxyproject/galaxy/pull/9747
For my own reference some debug notes https://gist.github.com/bernt-matthias/2c75e2266f626ba8bb5c331e178bce30
Hi @jmchilton since we debugged last time I'm having an issue with the resubmission and maybe you can shed some light on it. Since I can't really pass around parameters I've settled on having three destinations: 1 1,5 and 2x memory destinations.
I have the following dynamic destination:
And the following job conf:
With the expectation that it will submit to first, then second, then third.
In practice I see that it is only resubmitted once:
Galaxy logs