Dynflow / dynflow

DYNamic workFLOW orchestration engine
http://dynflow.github.io
MIT License
121 stars 44 forks source link

Fix undefined method wait for nil:NilClass #412

Closed carlospeon closed 2 years ago

carlospeon commented 2 years ago

NoMethodError: undefined method `wait' for nil:NilClass


/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.8/lib/dynflow/throttle_limiter.rb:71:in `block (2 levels) in handle_plans'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.8/lib/dynflow/throttle_limiter.rb:69:in `tap'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.8/lib/dynflow/throttle_limiter.rb:69:in `block in handle_plans'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.8/lib/dynflow/throttle_limiter.rb:68:in `map'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.8/lib/dynflow/throttle_limiter.rb:68:in `handle_plans'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.8/lib/dynflow/actor.rb:13:in `on_message'
...
adamruzicka commented 2 years ago

The fix seems reasonable. Do you have steps for reproducing the issue?

carlospeon commented 2 years ago

The fix seems reasonable. Do you have steps for reproducing the issue?

Yes, but using RH Satellite: running ansible roles over inventories with more than 100 hosts. The main task fails with this error while planning subtasks for each host. It is weird, usually fails on subtask 101, but sometimes it can reach 201, 301 and so on (maybe not related with dynflow).

Regards, Carlos.

adamruzicka commented 2 years ago

Interesting. What are the parameters of the job? Do you set a concurrency limit or time span when applying the roles?

carlospeon commented 2 years ago

Hope this helps:

Action:

Actions::RemoteExecution::RunHostsJob

Input:

{"job_invocation"=>
  {"id"=>1554,
   "name"=>"Ansible Playbook",
   "description"=>"Run Inditex Ansible roles"},
 "concurrency_control"=>{"level"=>{"tickets"=>5, "free"=>5, "meta"=>{}}},
 "job_category"=>"Ansible Playbook",
 "job_invocation_id"=>1554,
 "current_request_id"=>nil,
 "current_timezone"=>"Europe/Madrid",
 "current_user_id"=>6,
 "current_organization_id"=>3,
 "current_location_id"=>nil}

Output:

{"host_count"=>320,
 "planned_count"=>100,
 "cancelled_count"=>0,
 "total_count"=>100,
 "failed_count"=>0,
 "pending_count"=>100,
 "success_count"=>0}

Exception:

NoMethodError: undefined method `wait' for nil:NilClass
adamruzicka commented 2 years ago

I'm still failing to reproduce this. Where do you see the error? What is the status of the parent task?

carlospeon commented 2 years ago

Hello:

The parent task status is "failed" with the error NoMethodError: undefined method `wait' for nil:NilClass.

The child tasks that were planned finish "ok" (or failed as appropriate depending on the execution of ansible). the child tasks that could not be planned (before the main task finished in failed state) are in "N/A" status.

Once I've applied this fix all child tasks are planned right and no error can be seen on dynflow console. But... approximately the same number or child tasks finish in a "ok" (or failed) status. The other child tasks remain in "planned" status forever, and the parent task is in "running" status (forever too).

So I guess I'm just hitting just the surface of the issue.

adamruzicka commented 2 years ago

Yes, I expected something like that. The root issue stems from the parent task finishing before its children. As far as I can tell, this fix only addresses the consequences of it

carlospeon commented 2 years ago

Hello,

Solved root cause with a dedicated worker for remote_execution queue.

Regards, Carlos.