Nomad autodiscover fails to pick up some tasks

zemm commented 3 years ago

Nomad autodiscover introduced in #16853 seems to completely miss some new Nomad tasks, and therefore never ship any logs for them. The cause seems to be that it uses event's AllocModifyIndex to check for changes, when it should actually use ModifyIndex.

Version: 7.13.4 (amd64)
Operating System: CentOS 8
Discuss Forum URL:
Steps to Reproduce:

Test setup

1) Let's start a local Nomad dev instance. I'm using sudo here, because my task run with exec driver. This also requires sudo for alloc tasks later.

$ ./nomad version
Nomad v1.1.2 (60638a086ef9630e2a9ba1e237e8426192a44244)
$ sudo ./nomad agent -dev -alloc-dir /tmp/nomad_alloc -bind 0.0.0.0 -log-level=INFO

2) Let's run filebeat with a minimal Nomad autodiscovery config outputting to /tmp:

filebeat.autodiscover.providers:
  - type: nomad
    address: http://127.0.0.1:4646
    scope: node
    hints.enabled: true
    templates:
    - config:
      - type: log
        paths:
          - "/tmp/nomad_alloc/${data.nomad.allocation.id}/alloc/logs/${data.meta.nomad.task.name}.stderr.[0-9]*"
          - "/tmp/nomad_alloc/${data.nomad.allocation.id}/alloc/logs/${data.meta.nomad.task.name}.stdout.[0-9]*"

output.file:
  path: /tmp
  filename: filebeat.json

logging.level: debug
logging.to_files: true
logging.files:
  path: /tmp
  name: filebeat.log

rm /tmp/filebeat*; ./filebeat

3) Let's save Nomad allocation events for later comparison

$ curl -kSs "localhost:4646/v1/event/stream?topic=Allocation" > /tmp/alloc-events.json

4) Let's run some test jobs

I created two test jobs with tasks that are just exec scripts which produce test log by printing out it's name and a counter from 1 to 10 in intervals of 1 second, after which they idle silently. There should be 10 lines of log per task at the end.

Let's run two test jobs. The fist one consists of two groups of one task each. The second one has one group with three tasks. It's not that important, but this setup reproduced the problem for me. Optionally wait some time in between.

$ nomad job run count-to-ten-1.hcl
==> Monitoring evaluation "6cb442a9"
    Evaluation triggered by job "count-to-ten-1"
    Evaluation within deployment: "4cb7bdef"
    Allocation "1a7c549f" created: node "40f5ad2f", group "group-1"
    Allocation "a02e35fa" created: node "40f5ad2f", group "group-2"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "6cb442a9" finished with status "complete"
$ sleep 7
$ nomad job run count-to-ten-2.hcl
==> Monitoring evaluation "c2704d2b"
    Evaluation triggered by job "count-to-ten-2"
    Evaluation within deployment: "04da1cf1"
    Allocation "3bb7e48c" created: node "40f5ad2f", group "group"
    Allocation "8ebe0866" created: node "40f5ad2f", group "group"
    Allocation "d01ac162" created: node "40f5ad2f", group "group"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "c2704d2b" finished with status "complete"

Inspecting the results

We should have five allocations:

$ sudo ls -1 /tmp/nomad_alloc/
1a7c549f-9895-ea29-9ad0-630e95ddaf4f
3bb7e48c-14f3-c055-5c36-ce5966369111
8ebe0866-9a44-b1f8-26a8-2e5d1cf32af5
a02e35fa-c0bd-bf0e-107f-0f6a045664e2
d01ac162-279d-3b1f-3ff6-1258c3dec022

The allocations should have produced 50 lines of log (5 times 10 lines):

$ sudo sh -c 'cat /tmp/nomad_alloc/*/alloc/logs/*stdout.0'|wc -l
50

We should have 50 lines of filebeat log, but in my case I only got 30. This will vary between the runs, but most of the time some tasks are never picked up.

$ cat /tmp/filebeat.json |jq -r '.message'|wc -l
30

Analyzing the problem

The watcher prints out allocation info when it receives an event of allocation that is running:

$ cat /tmp/filebeat.log |grep allocation.id
2021-07-23T22:52:08.881Z        DEBUG   [nomad] nomad/watcher.go:162    allocation.id=3bb7e48c-14f3-c055-5c36-ce5966369111 waitIndex=13 CreateIndex=17 ModifyIndex=25 AllocModifyIndex=17 updated=false
2021-07-23T22:52:08.901Z        DEBUG   [nomad] nomad/watcher.go:162    allocation.id=8ebe0866-9a44-b1f8-26a8-2e5d1cf32af5 waitIndex=13 CreateIndex=17 ModifyIndex=25 AllocModifyIndex=17 updated=false
2021-07-23T22:52:08.902Z        DEBUG   [nomad] nomad/watcher.go:162    allocation.id=d01ac162-279d-3b1f-3ff6-1258c3dec022 waitIndex=13 CreateIndex=17 ModifyIndex=25 AllocModifyIndex=17 updated=false

I only got 3 of the 5 allocations here, as inspected with the 30 log lines shipped. Now let's inspect the Nomad allocations event stream that was saved in the beginning:

$ cat /tmp/alloc-events.json | jq -rf ./filter-alloc-events-2.jq 2>/dev/null
1a7c549f-9895-ea29-9ad0-630e95ddaf4f CreateIndex=11 ModifyIndex=11 AllocModifyIndex=11 DesiredStatus=run ClientStatus=pending Type=PlanResult
a02e35fa-c0bd-bf0e-107f-0f6a045664e2 CreateIndex=11 ModifyIndex=11 AllocModifyIndex=11 DesiredStatus=run ClientStatus=pending Type=PlanResult
a02e35fa-c0bd-bf0e-107f-0f6a045664e2 CreateIndex=11 ModifyIndex=13 AllocModifyIndex=11 DesiredStatus=run ClientStatus=pending Type=AllocationUpdated
1a7c549f-9895-ea29-9ad0-630e95ddaf4f CreateIndex=11 ModifyIndex=13 AllocModifyIndex=11 DesiredStatus=run ClientStatus=pending Type=AllocationUpdated
1a7c549f-9895-ea29-9ad0-630e95ddaf4f CreateIndex=11 ModifyIndex=15 AllocModifyIndex=11 DesiredStatus=run ClientStatus=running Type=AllocationUpdated
a02e35fa-c0bd-bf0e-107f-0f6a045664e2 CreateIndex=11 ModifyIndex=15 AllocModifyIndex=11 DesiredStatus=run ClientStatus=running Type=AllocationUpdated
d01ac162-279d-3b1f-3ff6-1258c3dec022 CreateIndex=17 ModifyIndex=17 AllocModifyIndex=17 DesiredStatus=run ClientStatus=pending Type=PlanResult
3bb7e48c-14f3-c055-5c36-ce5966369111 CreateIndex=17 ModifyIndex=17 AllocModifyIndex=17 DesiredStatus=run ClientStatus=pending Type=PlanResult
8ebe0866-9a44-b1f8-26a8-2e5d1cf32af5 CreateIndex=17 ModifyIndex=17 AllocModifyIndex=17 DesiredStatus=run ClientStatus=pending Type=PlanResult
3bb7e48c-14f3-c055-5c36-ce5966369111 CreateIndex=17 ModifyIndex=19 AllocModifyIndex=17 DesiredStatus=run ClientStatus=pending Type=AllocationUpdated
8ebe0866-9a44-b1f8-26a8-2e5d1cf32af5 CreateIndex=17 ModifyIndex=19 AllocModifyIndex=17 DesiredStatus=run ClientStatus=pending Type=AllocationUpdated
d01ac162-279d-3b1f-3ff6-1258c3dec022 CreateIndex=17 ModifyIndex=19 AllocModifyIndex=17 DesiredStatus=run ClientStatus=pending Type=AllocationUpdated
1a7c549f-9895-ea29-9ad0-630e95ddaf4f CreateIndex=11 ModifyIndex=20 AllocModifyIndex=11 DesiredStatus=run ClientStatus=running Type=AllocationUpdated
a02e35fa-c0bd-bf0e-107f-0f6a045664e2 CreateIndex=11 ModifyIndex=20 AllocModifyIndex=11 DesiredStatus=run ClientStatus=running Type=AllocationUpdated
8ebe0866-9a44-b1f8-26a8-2e5d1cf32af5 CreateIndex=17 ModifyIndex=24 AllocModifyIndex=17 DesiredStatus=run ClientStatus=running Type=AllocationUpdated
3bb7e48c-14f3-c055-5c36-ce5966369111 CreateIndex=17 ModifyIndex=24 AllocModifyIndex=17 DesiredStatus=run ClientStatus=running Type=AllocationUpdated
d01ac162-279d-3b1f-3ff6-1258c3dec022 CreateIndex=17 ModifyIndex=24 AllocModifyIndex=17 DesiredStatus=run ClientStatus=running Type=AllocationUpdated

As we can see, Filebeat completely missed allocations a02e35fa-c0bd-bf0e-107f-0f6a045664e2 and 1a7c549f-9895-ea29-9ad0-630e95ddaf4f. Further more we can see that the AllocModifyIndex remains the same in consecutive events for a given allocation, while ModifyIndex grows for each event per allocation. Also ModifyIndex only ever advances, where as AllocModifyIndex actually chronologically sometimes goes backwards, but it's compared against advancing counter.

Now looking at the nomad/watcher.go code

        // the allocation has not changed since last seen, ignore
        if w.waitIndex > alloc.AllocModifyIndex {

This should actually compare against ModifyIndex instead. I'd maybe also consider adding a debug message for the skip, since that's what helped me while debugging this.

diff --git a/x-pack/libbeat/common/nomad/watcher.go b/x-pack/libbeat/common/nomad/watcher.go
index eba77e764a..200c48a47d 100644
--- a/x-pack/libbeat/common/nomad/watcher.go
+++ b/x-pack/libbeat/common/nomad/watcher.go
@@ -131,7 +131,10 @@ func (w *watcher) sync() error {

        for _, alloc := range allocations {
                // the allocation has not changed since last seen, ignore
-               if w.waitIndex > alloc.AllocModifyIndex {
+               if w.waitIndex > alloc.ModifyIndex {
+                       w.logger.Debugf("Skip allocation:%s ClientStatus:%s because w.waitIndex=%v > alloc.ModifyIndex=%v",
+                               alloc.ID, alloc.ClientStatus,
+                               fmt.Sprint(w.waitIndex), fmt.Sprint(alloc.ModifyIndex))
                        continue
                }

I've built and tested this earlier, but don't have it on hand now.

zemm commented 3 years ago

And while I still have the environment, let's also see the w.waitIndex:

$ grep watcher\.go /tmp/filebeat.log |tail -n5
2021-07-24T00:03:54.294Z        DEBUG   [nomad] nomad/watcher.go:127    Allocations index is unchanged remoteWaitIndex=25 == localWaitIndex=25
2021-07-24T00:04:08.864Z        DEBUG   [nomad] nomad/watcher.go:106    Syncing allocations and metadata
2021-07-24T00:04:08.864Z        DEBUG   [nomad] nomad/watcher.go:107    Starting with WaitIndex=25
2021-07-24T00:04:24.232Z        DEBUG   [nomad] nomad/watcher.go:120    Found 5 allocations
2021-07-24T00:04:24.232Z        DEBUG   [nomad] nomad/watcher.go:127    Allocations index is unchanged remoteWaitIndex=25 == localWaitIndex=25

And from the Nomad event stream we can see that the value of 25 has already passed the two run events that were missed, so these will not be picked up:

1a7c549f-9895-ea29-9ad0-630e95ddaf4f CreateIndex=11 ModifyIndex=20 AllocModifyIndex=11 DesiredStatus=run ClientStatus=running Type=AllocationUpdated
a02e35fa-c0bd-bf0e-107f-0f6a045664e2 CreateIndex=11 ModifyIndex=20 AllocModifyIndex=11 DesiredStatus=run ClientStatus=running Type=AllocationUpdated

elasticmachine commented 3 years ago

Pinging @elastic/integrations (Team:Integrations)

ChrsMark commented 3 years ago

cc: @jsoriano

zemm commented 3 years ago

At the risk of this already being way more verbose than it needs for a single line change, here is the filter-alloc-events-2.jq used above to reduce the Nomad even api messages:

.Events | map(
    (.Payload.Allocation.ID | tostring)
    + " CreateIndex=" + (.Payload.Allocation.CreateIndex | tostring)
    + " ModifyIndex=" + (.Payload.Allocation.ModifyIndex | tostring)
    + " AllocModifyIndex=" + (.Payload.Allocation.AllocModifyIndex | tostring)
    + " DesiredStatus=" + .Payload.Allocation.DesiredStatus
    + " ClientStatus=" + .Payload.Allocation.ClientStatus
    + " Type=" + .Type
) | .[]

jsoriano commented 3 years ago

@zemm thanks for reporting this! Do you think that only changing AllocModifyIndex with ModifyIndex would fix this issue? I am not planning to work with nomad autodiscover soon, but if you can open a pull request with the proposed change and you have the chance to try it on a real deployment I will be happy to review the changes.

zemm commented 3 years ago

@jsoriano I'm sure the current version isn't usable, seems like I might be the first one to actually run it :sweat_smile:. With a brief testing previously, I think this change is enough, but I can (and must) run this on a real, albeit small 3+3 node, deployment within a couple of weeks. I can monitor that for a while and report how it's going.

For now, I'd rather not sign the Contributor Agreement only for this one line. Maybe if there'll be more complex changes or improvements.

And thank you for this feature!

zemm commented 3 years ago

I've now run Filebeat 7.14.1 with this one line patch

-               if w.waitIndex > alloc.AllocModifyIndex {
+               if w.waitIndex > alloc.ModifyIndex {

for a week (I was busy with other things) with 6 Nomad client hosts and 50+ allocations. There have been few rotations and restarts, and logging seems to have been keeping up.

I have not verified that every log line has been collected, but currently I have no reason to believe othewise. I'll see if I have at some point time to go through some verifications, but looks to be working for now.

zemm commented 3 years ago

I checked the all of the currently running 53 allocations, and all of them have been picked up by this patched version, meaning that I found plenty of logs for each of the 53 allocation ids from the central log storage that they should be sent to. 29 of these allocations have been created after the Filebeats have been started.

I won't be checking the log lines themselves, since that's the regular log-type input watching regular files.

I'll be checking back if something comes up, but looks good to me.

adeleglise commented 3 years ago

Hey, sorry to piggybacking on this issue but I can't seem to get the logs from the nomad autodiscovery to be processed.

I've tried this config :

filebeat.autodiscover.providers:
  - type: nomad
    node: redacted
    address: "http://redacted:4646"
    secret_id: "redacted"
    scope: node
    hints.enabled: true
    templates:
    - config:
      - type: log
        paths:
          - "/var/nomad/alloc/${data.nomad.alloc_id}/alloc/logs/${data.nomad.task.name}.stderr.[0-9]*"

Filebeat is indeed seeing events (even with the bug @zemm metionned) but no logs are outputed.

 autodiscover/autodiscover.go:172        Got a start event.        {"autodiscover.event": {"config":[],"host":"redacted","id":"cf62901c-16e4-2528-a4a7-1a35fa92bb35-app","meta":{"nomad":{"allocation":{"id":"cf62901c-16e4-2528-a4a7-1a35fa92bb35","name":"app[2]","status":"running"},"datacenter":["dc5"],"job":{"name":"app","type":"service"},"namespace":"default","region":"region","task":{"name":"app"}}},"nomad":{"allocation":{"id":"cf62901c-16e4-2528-a4a7-1a35fa92bb35","name":"app[2]","status":"running"},"datacenter":["dc5"],"job":{"app","type":"service"},"namespace":"default","region":"region"},"provider":"f6f26bdf-985f-41f1-9620-368c5fdf6bcc","start":true}}

I'm running filebeat 7.15.0 directly on the VM host (ubuntu 20.04), not inside nomad. Do someone have a clue, or is also using this?

jsoriano commented 3 years ago

@zemm thanks for testing the change! I am sorry because I couldn't dedicate time to this, but on a quick look I wonder if this condition would also need to be updated? https://github.com/elastic/beats/blob/85621341020e21ba543f2305e9a1e35f66aacb63/x-pack/libbeat/common/nomad/watcher.go#L159

Would you have the chance of opening a PR for further discussion?

Longer-term it'd be nice to reimplement this watcher using the events API instead of periodically polling, but this is another story :slightly_smiling_face:

zemm commented 3 years ago

How did I miss that :scream: Thanks!

I'll try to find time to replicate missed events with in-place allocation updates, which on a quick glance could be the case with the current code, and then try with a fix.

For the PR, unfortunately I still don't want to bind myself and save my personal details into some legal agreement system over an obvious fix of 2 words. I also fully understand the need for the agreement, and therefore won't contribute a formal commit and PR at this time. Sorry for making things tedious :(

jsoriano commented 3 years ago

@zemm understood, thanks for your time so far! I have created a PR with the code you propose, and a test case that seems to reproduce the issue. Please take a look: https://github.com/elastic/beats/pull/28700

Let me know if you can confirm that the change in the other condition works for you.

zemm commented 2 years ago

@jsoriano thanks, and sorry for I did not have, and would not have had in this month, time or energy to test the OnUpdate logic on this. I did some quick testing, but at the end wasn't sure what the proper scenario and implications would have been.

And sorry to comment too late to a resolved ticket, but one thing I did notice previously was that in some cases the w.handler.OnDelete() will be called multiple times. I've not had any problems with this in practice. Is that method idempotent, and therefore while not optimal, not an immediate problem?

jsoriano commented 2 years ago

Hey @zemm, no problem, I think that the changes you proposed are good. If you have the chance of doing more tests, please keep us updated! I think we will release these changes in 7.16.0 and 8.0.0.

And sorry to comment too late to a resolved ticket, but one thing I did notice previously was that in some cases the w.handler.OnDelete() will be called multiple times. I've not had any problems with this in practice. Is that method idempotent, and therefore while not optimal, not an immediate problem?

Umm, this should only happen when the allocation is being terminated, I guess that it can transition through different termination statuses, and for each one the provider may call this handler, but it should be fine, multiple deletions should be idempotent as you say. We can revisit this in any case if this is problematic.

Thanks for your help on this!

elastic / beats