choria-legacy / mcollective-choria

Distribution of plugins for MCollective as found in Puppet 6
Apache License 2.0
56 stars 24 forks source link

`mco tasks run` with `-C` filter timed out waiting for response from nodes #606

Closed jay7x closed 3 years ago

jay7x commented 4 years ago

When trying to run bolt task with -C filter for discovery mco is timed out:

$ mco tasks run spd::query_exporter --url http://127.0.0.1:9080 --metrics kafka_controller_kafkacontroller_activecontrollercount --verbose -C Role::Chrono::Kafka
Retrieving task metadata for task spd::query_exporter from the Puppet Server
Discovering hosts using the choria method .... 5
Attempting to download and run task spd::query_exporter on 5 nodes

warn 2019/11/20 09:53:09: client.rb:281:in `rescue in start_receiver' Could not receive all responses. Did not receive responses from spdachronokafkaq1.node.spda, spdachronokafkaq3.node.spda, spdachronokafkaq4.node.spda, spdachronokafkaq2.node.spda, spdachronokafkaq5.node.spda

No response from:

    spdachronokafkaq1.node.spda    spdachronokafkaq2.node.spda    spdachronokafkaq3.node.spda
    spdachronokafkaq4.node.spda    spdachronokafkaq5.node.spda

Could not download the task spd::query_exporter onto all nodes

Though when doing the same on same nodes but with -I filter it works fine:

$ mco tasks run spd::query_exporter --url http://127.0.0.1:9080 --metrics kafka_controller_kafkacontroller_activecontrollercount --verbose -I /spdachronokafkaq[12345].node.spda/
Retrieving task metadata for task spd::query_exporter from the Puppet Server
Discovering hosts using the choria method .... 5
Attempting to download and run task spd::query_exporter on 5 nodes

Downloading and verifying 1 file(s) from the Puppet Server to all nodes: ✓  5 / 5

Running task spd::query_exporter and waiting up to 60 seconds for it to complete

spdachronokafkaq4.node.spda
   {"metrics":["kafka_controller_kafkacontroller_activecontrollercount 0.0"]}

spdachronokafkaq3.node.spda
   {"metrics":["kafka_controller_kafkacontroller_activecontrollercount 0.0"]}

spdachronokafkaq5.node.spda
   {"metrics":["kafka_controller_kafkacontroller_activecontrollercount 0.0"]}

spdachronokafkaq2.node.spda
   {"metrics":["kafka_controller_kafkacontroller_activecontrollercount 0.0"]}

spdachronokafkaq1.node.spda
   {"metrics":["kafka_controller_kafkacontroller_activecontrollercount 1.0"]}

Summary for task b25a815b8ca75f66bda2b285ca185e74

                       Task Name: spd::query_exporter
                          Caller: choria=spdigital.mcollective
                       Completed: 5
                         Running: 0

                      Successful: 5
                          Failed: 0

                Average Run Time: 1.59s

Task details:

$ mco tasks --detail spd::query_exporter
Retrieving task metadata for task spd::query_exporter from the Puppet Server

spd::query_exporter - Query values from prometheus exporter

Task Parameters:
  metrics                        Metrics to fetch (Optional[Variant[String[1], Array[String]]])
  url                            Exporter URL to query (String[1])

Task Files:
  query_exporter.rb              3526 bytes

Use 'mco tasks run spd::query_exporter' to run this task

P.S. May be related to this: https://github.com/choria-io/mcollective-choria/issues/457

jay7x commented 4 years ago

JFYI, it works fine while in playbook too.

Called this way:

  $res = choria::run_playbook('choria::tasks::run',
    nodes  => $nodes,
    task   => 'spd::query_exporter',
    silent => $silent,
    inputs => {
      url     => 'http://127.0.0.1:9080',
      metrics => [
        'kafka_server_replicamanager_underreplicatedpartitions',
        'kafka_controller_kafkacontroller_activecontrollercount',
      ],
    }
  )

Related output:

Info: Scope(<module>/choria/plans/tasks/download_files.pp, 12): Downloading files for task 'spd::query_exporter' onto 5 nodes
Notice: Scope(<module>/choria/plans/tasks/download_files.pp, 14): About to run task: mcollective task
Notice: Scope(<module>/choria/plans/tasks/download_files.pp, 14): Starting request for bolt_tasks#download against 5 nodes
Notice: Scope(<module>/choria/plans/tasks/download_files.pp, 14): Successful request 8ec906097c195e4dbceb9c845e9d0d1e for bolt_tasks#download in 2.11s against 5 node(s)
ripienaar commented 4 years ago

I cant reproduce this, I think something was transiently failing or something, does this happen every time?

% mco tasks run mcollective_agent_bolt_tasks::ping --message=foo -C apache

above works for me

ripienaar commented 3 years ago

closing this, cant reproduce