Open Ompragash opened 5 years ago
This is very similar to a problem I've raised with Red Hat support on behalf of my client - although I'm not sure what SJT is. We find that when re-doing failed hosts there may be fewer hosts than the number of job slices.
I suggested that AWX be modified to either:
SJT might mean Sliced Job Template.
The reason this happens is that ansible-playbook doesn't like being told to run for zero hosts; ansible-runner doesn't detect that situation (and is unwilling to change that behavior) and passes the failure upwards.
Possible approaches for a fix include
Another possible idea is to allow the number to be selected with "prompt on launch." Of course that would only help with known quantities.
Another possible idea is to allow the number to be selected with "prompt on launch." Of course that would only help with known quantities.
It would also require unnecessary manual action on behalf of the user.
EDIT and as to say if a user just clicks to re run on failed hosts they may not even be aware of how many there are.
Sure, I'm thinking in the case where you might normally want it split across 3 nodes, but then need to override for 1. Seems silly to have separate workflows to control that single variable. or change/saving each time. Or in the middle of a workflow where you know you only want it on less than the normal. Not the full solution for sure, but would be handy.
Given that https://github.com/ansible/ansible/pull/76438 was rejected, and Controller as it is does not really have a way of knowing how many hosts may match a filter -- AFAIK the filter is passed to ansible and doesn't apply until runtime of the job -- controller makes its decision about how many slices to create BEFORE the filter is applied. The only thing I think we could possibly do since we've not been able to land fixes in ansible and runner is do some kind of preliminary "apply the filter to the inventory and see how many match" step.
This would almost be like a inventory update before the sliced job with a limit spawns its slices.
I'm thinking: 1) a sliced job is launched with a limit applied, so we create it with dependencies_processed=false. Not sure on details here aboue WHEN it becomes a workflow job. But if it is a workflow job from the get-go, it will be new for workflow jobs to have dependencies_processed=false. 2) We launch some kind of inventory update like process that does the thing to find out how many hosts the limit will cut the inventory down to, save this info as number of slices to spawn on the workflow/at this point decide what the workflow nodes will be, set dependencies_processed=true 4) proceed as we do today, now the workflow is ready to run
I'm sure we could do something more elegant, but hacky way to do the inventory like update now might be approximated by what I can do on the CLI:
given an inventory file named hosts
[mygroup]
testhost[:100]
[foogroup]
matchinghost
This inventory has 103 hosts. But if I run
ansible -i hosts all --list-hosts --limit matchinghost
I get the output:
hosts (1):
matchinghost
Which tells me of my inventory with 103 hosts, only 1 matches the limit
Are there any updates on this topic?
There's no way to pass number of slices through a Workflow Template and all Job Templates must be pre-configured - this is an issue when you are executing against big number of hosts by default, but want to run against a smaller batch that is smaller or equal/close to the number of slices. In order to workaround it, you need to reconfigure all your Job Templates that are part of the workflow template....
ISSUE TYPE
COMPONENT NAME
SUMMARY
SJT creates number of jobs as per the slicing count for the limited hosts.
ENVIRONMENT
STEPS TO REPRODUCE
Create a Inventory with multiple hosts Create a SJT with multiple slices and select the above created Inventory Now, Limit the SJT for one of the hosts from the provided Inventory Launch the SJT
EXPECTED RESULTS
Only one job is created for the limited hosts even if the Job Slicing value is >1.
ACTUAL RESULTS
Multiple jobs are created for the limited hosts as per the Job Slicing count.
ADDITIONAL INFORMATION
Even if multiple jobs are created, only one succeeds rest everything fails with
ERROR! Specified hosts and/or --limit does not match any hosts
.