Closed carlospeon closed 2 years ago
The fix seems reasonable. Do you have steps for reproducing the issue?
The fix seems reasonable. Do you have steps for reproducing the issue?
Yes, but using RH Satellite: running ansible roles over inventories with more than 100 hosts. The main task fails with this error while planning subtasks for each host. It is weird, usually fails on subtask 101, but sometimes it can reach 201, 301 and so on (maybe not related with dynflow).
Regards, Carlos.
Interesting. What are the parameters of the job? Do you set a concurrency limit or time span when applying the roles?
Hope this helps:
Action:
Actions::RemoteExecution::RunHostsJob
Input:
{"job_invocation"=>
{"id"=>1554,
"name"=>"Ansible Playbook",
"description"=>"Run Inditex Ansible roles"},
"concurrency_control"=>{"level"=>{"tickets"=>5, "free"=>5, "meta"=>{}}},
"job_category"=>"Ansible Playbook",
"job_invocation_id"=>1554,
"current_request_id"=>nil,
"current_timezone"=>"Europe/Madrid",
"current_user_id"=>6,
"current_organization_id"=>3,
"current_location_id"=>nil}
Output:
{"host_count"=>320,
"planned_count"=>100,
"cancelled_count"=>0,
"total_count"=>100,
"failed_count"=>0,
"pending_count"=>100,
"success_count"=>0}
Exception:
NoMethodError: undefined method `wait' for nil:NilClass
I'm still failing to reproduce this. Where do you see the error? What is the status of the parent task?
Hello:
The parent task status is "failed" with the error NoMethodError: undefined method `wait' for nil:NilClass.
The child tasks that were planned finish "ok" (or failed as appropriate depending on the execution of ansible). the child tasks that could not be planned (before the main task finished in failed state) are in "N/A" status.
Once I've applied this fix all child tasks are planned right and no error can be seen on dynflow console. But... approximately the same number or child tasks finish in a "ok" (or failed) status. The other child tasks remain in "planned" status forever, and the parent task is in "running" status (forever too).
So I guess I'm just hitting just the surface of the issue.
Yes, I expected something like that. The root issue stems from the parent task finishing before its children. As far as I can tell, this fix only addresses the consequences of it
Hello,
Solved root cause with a dedicated worker for remote_execution queue.
Regards, Carlos.
NoMethodError: undefined method `wait' for nil:NilClass