Closed iamredbull closed 1 year ago
I noticed such a moment, when starting a nomad-job with several groups, only one of the groups receives the label. Nomad-job:
job "example-job" {
datacenters = ["dc1"]
namespace = "dedicated"
constraint {
attribute = "${attr.unique.consul.name}"
operator = "="
value = "cn6-host48"
}
meta = {
"example.com/app_name" = "service-echo"
}
group "http-echo-group" {
network {
mode = "cni/cilium"
dns {
servers = ["172.17.0.1"]
}
}
restart {
attempts = 3
interval = "15m"
delay = "20s"
mode = "fail"
}
service {
name = "http-echo"
port = "80"
tags = ["http-echo"]
address_mode = "alloc"
}
task "http-echo" {
driver = "docker"
config {
image = "hashicorp/http-echo"
args = [
"--text=hello world",
"--listen=:80"
]
auth_soft_fail = true
}
resources {
cpu = 500
memory = 256
}
}
}
group "network-multitool-group" {
network {
dns {
servers = ["172.17.0.1"]
}
mode = "cni/cilium"
}
restart {
attempts = 3
interval = "15m"
delay = "20s"
mode = "fail"
}
service {
name = "network-multitool"
port = "80"
tags = ["network-multitool"]
address_mode = "alloc"
}
task "network-multitool" {
driver = "docker"
config {
image = "wbitt/network-multitool"
auth_soft_fail = true
}
resources {
cpu = 500
memory = 256
}
}
}
}
Cilium endpoint list:
ENDPOINT POLICY (ingress) POLICY (egress) IDENTITY LABELS (source:key[=value]) IPv6 IPv4 STATUS
ENFORCEMENT ENFORCEMENT
141 Enabled Enabled 4 reserved:health 172.16.171.94 ready
1418 Disabled Disabled 1 reserved:host ready
1535 Enabled Enabled 5 reserved:init 172.16.6.61 ready
2641 Enabled Enabled 28939 netreap:nomad.job_id=example-job 172.16.44.243 ready
netreap:nomad.namespace=dedicated
nomad:example.com/app_name=service-echo
reserved:init
Netreap-job logs:
2023-08-01T09:58:39.847Z DEBUG netreap/main.go:124 Starting node reaper
2023-08-01T09:58:39.847Z DEBUG reapers/nodes.go:107 Beginning reconciliation
2023-08-01T09:58:39.847Z DEBUG reapers/nodes.go:108 Getting nomad node list
2023-08-01T09:58:39.865Z DEBUG reapers/nodes.go:119 Finished constructing list of all nodes {"nodes": {"cn6-host48":{},"cpx31-host58":{}}}
2023-08-01T09:58:39.866Z DEBUG reapers/nodes.go:121 Fetching cilium nodes from consul
2023-08-01T09:58:39.902Z DEBUG netreap/main.go:135 Starting endpoint reaper
2023-08-01T09:58:39.902Z DEBUG reapers/endpoints.go:155 Starting reconciliation
2023-08-01T09:58:39.911Z DEBUG reapers/endpoints.go:169 Finished fetching service list, constructing set of IP addresses from servicesservice_list[{consul} {netreap} {nomad-clients} {nomad-servers}]
2023-08-01T09:58:39.918Z INFO reapers/nodes.go:56 Waiting for leader election
2023-08-01T09:58:39.945Z DEBUG reapers/endpoints.go:203 Finished generating current IP list. Fetching endpoints from cilium {"ip_list": {}}
2023-08-01T09:58:39.949Z DEBUG reapers/endpoints.go:211 Checking all endpoints
2023-08-01T09:58:39.949Z DEBUG reapers/endpoints.go:219 Endpoint is not an init service, skipping {"labels": ["reserved:health"]}
2023-08-01T09:58:39.949Z DEBUG reapers/endpoints.go:219 Endpoint is not an init service, skipping {"labels": ["reserved:host"]}
2023-08-01T09:58:39.949Z DEBUG reapers/endpoints.go:265 Finished reconciliation {"num_errors": 0}
2023-08-01T09:58:39.982Z DEBUG netreap/main.go:146 starting policy poller
2023-08-01T09:58:39.983Z INFO policy_poller policy/policy.go:41 starting Consul watch for key: netreap.io/policy
2023-08-01T09:58:39.988Z DEBUG reapers/endpoints.go:93 Got 2 job events. Handling...
2023-08-01T09:58:39.988Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T09:58:39.988Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T09:58:39.994Z INFO policy_poller policy/policy.go:98 loaded new policy
2023-08-01T09:58:40.261Z DEBUG reapers/endpoints.go:93 Got 3 job events. Handling...
2023-08-01T09:58:40.261Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T09:58:40.261Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T09:58:40.261Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T09:59:30.969Z DEBUG elector/mod.go:108 Unable to acquire lock. Retrying up to 6 times
2023-08-01T09:59:33.305Z DEBUG reapers/endpoints.go:93 Got 1 job events. Handling...
2023-08-01T09:59:33.307Z DEBUG reapers/endpoints.go:416 Job was empty {"event_type": "JobDeregistered"}
2023-08-01T09:59:33.384Z DEBUG reapers/endpoints.go:93 Got 1 job events. Handling...
2023-08-01T09:59:33.384Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of EvaluationUpdated
2023-08-01T09:59:40.982Z DEBUG elector/mod.go:115 Lock retry 1 did not succeed
2023-08-01T09:59:49.182Z DEBUG reapers/endpoints.go:93 Got 1 job events. Handling...
2023-08-01T09:59:49.182Z DEBUG reapers/endpoints.go:93 Got 2 job events. Handling...
2023-08-01T09:59:49.183Z DEBUG reapers/endpoints.go:416 Job was empty {"event_type": "JobRegistered"}
2023-08-01T09:59:49.209Z DEBUG reapers/endpoints.go:327 Fetching services from consul for job {"job_id": "example-job", "retry_num": 1}
2023-08-01T09:59:49.210Z DEBUG reapers/endpoints.go:327 Fetching services from consul for job {"job_id": "example-job", "retry_num": 1}
2023-08-01T09:59:49.218Z DEBUG reapers/endpoints.go:334 Did not find a ready service in consul {"job_id": "example-job", "retry_num": 1}
2023-08-01T09:59:49.218Z DEBUG reapers/endpoints.go:334 Did not find a ready service in consul {"job_id": "example-job", "retry_num": 1}
2023-08-01T09:59:49.483Z DEBUG reapers/endpoints.go:93 Got 5 job events. Handling...
2023-08-01T09:59:49.483Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of PlanResult
2023-08-01T09:59:49.483Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of PlanResult
2023-08-01T09:59:49.483Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of PlanResult
2023-08-01T09:59:49.483Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of PlanResult
2023-08-01T09:59:49.483Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of PlanResult
2023-08-01T09:59:49.536Z DEBUG reapers/endpoints.go:93 Got 1 job events. Handling...
2023-08-01T09:59:49.536Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of EvaluationUpdated
2023-08-01T09:59:50.295Z DEBUG reapers/endpoints.go:93 Got 2 job events. Handling...
2023-08-01T09:59:50.295Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T09:59:50.295Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T09:59:50.600Z DEBUG reapers/endpoints.go:93 Got 2 job events. Handling...
2023-08-01T09:59:50.600Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T09:59:50.600Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T09:59:50.993Z DEBUG elector/mod.go:115 Lock retry 2 did not succeed
2023-08-01T09:59:51.218Z DEBUG reapers/endpoints.go:327 Fetching services from consul for job {"job_id": "example-job", "retry_num": 2}
2023-08-01T09:59:51.218Z DEBUG reapers/endpoints.go:327 Fetching services from consul for job {"job_id": "example-job", "retry_num": 2}
2023-08-01T09:59:51.228Z DEBUG reapers/endpoints.go:344 Found services for new jobjob_idexample-job
2023-08-01T09:59:51.228Z DEBUG reapers/endpoints.go:356 Finding related cilium endpoint for job {"job_id": "example-job"}
2023-08-01T09:59:51.228Z DEBUG reapers/endpoints.go:344 Found services for new jobjob_idexample-job
2023-08-01T09:59:51.228Z DEBUG reapers/endpoints.go:356 Finding related cilium endpoint for job {"job_id": "example-job"}
2023-08-01T09:59:51.840Z DEBUG reapers/endpoints.go:93 Got 3 job events. Handling...
2023-08-01T09:59:51.840Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T09:59:51.840Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T09:59:51.840Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T10:00:01.033Z DEBUG elector/mod.go:115 Lock retry 3 did not succeed
2023-08-01T10:00:01.751Z DEBUG reapers/endpoints.go:93 Got 3 job events. Handling...
2023-08-01T10:00:01.751Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T10:00:01.751Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T10:00:01.751Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T10:00:01.998Z DEBUG reapers/endpoints.go:93 Got 3 job events. Handling...
2023-08-01T10:00:01.998Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T10:00:01.998Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T10:00:01.998Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T10:00:02.986Z DEBUG reapers/endpoints.go:93 Got 1 job events. Handling...
2023-08-01T10:00:02.986Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdateDesiredStatus
2023-08-01T10:00:03.242Z DEBUG reapers/endpoints.go:93 Got 3 job events. Handling...
2023-08-01T10:00:03.242Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of PlanResult
2023-08-01T10:00:03.242Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of PlanResult
2023-08-01T10:00:03.242Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of PlanResult
2023-08-01T10:00:03.334Z DEBUG reapers/endpoints.go:93 Got 1 job events. Handling...
2023-08-01T10:00:03.334Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of EvaluationUpdated
2023-08-01T10:00:11.044Z DEBUG elector/mod.go:115 Lock retry 4 did not succeed
2023-08-01T10:00:21.077Z DEBUG elector/mod.go:115 Lock retry 5 did not succeed
2023-08-01T10:00:31.097Z DEBUG elector/mod.go:115 Lock retry 6 did not succeed
2023-08-01T10:00:31.097Z DEBUG elector/mod.go:117 Never acquired lock after retry
The second group remains in the init state.
But if I restart Netreap in the cluster, both groups will immediately get the label. Cilium endpoint list:
ENDPOINT POLICY (ingress) POLICY (egress) IDENTITY LABELS (source:key[=value]) IPv6 IPv4 STATUS
ENFORCEMENT ENFORCEMENT
141 Enabled Enabled 4 reserved:health 172.16.171.94 ready
1418 Disabled Disabled 1 reserved:host ready
1535 Enabled Enabled 28939 netreap:nomad.job_id=example-job 172.16.6.61 ready
netreap:nomad.namespace=dedicated
nomad:example.com/app_name=service-echo
reserved:init
2641 Enabled Enabled 28939 netreap:nomad.job_id=example-job 172.16.44.243 ready
netreap:nomad.namespace=dedicated
nomad:example.com/app_name=service-echo
reserved:init
Netreap-job logs:
2023-08-01T10:05:21.560Z DEBUG netreap/main.go:124 Starting node reaper
2023-08-01T10:05:21.561Z DEBUG reapers/nodes.go:107 Beginning reconciliation
2023-08-01T10:05:21.561Z DEBUG reapers/nodes.go:108 Getting nomad node list
2023-08-01T10:05:21.578Z DEBUG reapers/nodes.go:119 Finished constructing list of all nodes {"nodes": {"cn6-host48":{},"cpx31-host58":{}}}
2023-08-01T10:05:21.578Z DEBUG reapers/nodes.go:121 Fetching cilium nodes from consul
2023-08-01T10:05:21.617Z DEBUG netreap/main.go:135 Starting endpoint reaper
2023-08-01T10:05:21.618Z DEBUG reapers/endpoints.go:155 Starting reconciliation
2023-08-01T10:05:21.626Z DEBUG reapers/endpoints.go:169 Finished fetching service list, constructing set of IP addresses from servicesservice_list[{network-multitool} {nomad-clients} {nomad-servers} {consul} {http-echo} {netreap}]
2023-08-01T10:05:21.628Z INFO reapers/nodes.go:56 Waiting for leader election
2023-08-01T10:05:21.674Z DEBUG reapers/endpoints.go:203 Finished generating current IP list. Fetching endpoints from cilium {"ip_list": {"172.16.212.128":{"ID":"df8a0bec-b718-d91e-9f8d-0e5ef3b7e077","Namespace":""},"172.16.242.70":{"ID":"52777fb2-ac22-749a-f709-57a5ecddb881","Namespace":""}}}
2023-08-01T10:05:21.680Z DEBUG reapers/endpoints.go:211 Checking all endpoints
2023-08-01T10:05:21.680Z DEBUG reapers/endpoints.go:219 Endpoint is not an init service, skipping {"labels": ["netreap:nomad.job_id=example-job","netreap:nomad.namespace=dedicated","nomad:example.com/app_name=service-echo"]}
2023-08-01T10:05:21.680Z DEBUG reapers/endpoints.go:219 Endpoint is not an init service, skipping {"labels": ["reserved:host"]}
2023-08-01T10:05:21.680Z DEBUG reapers/endpoints.go:219 Endpoint is not an init service, skipping {"labels": ["reserved:health"]}
2023-08-01T10:05:21.680Z DEBUG reapers/endpoints.go:222 Checking if endpoint still exists {"endpoint_id": 1500}
2023-08-01T10:05:21.680Z DEBUG reapers/endpoints.go:227 Got ip {"ip": {"ipv4":"172.16.212.128"}}
2023-08-01T10:05:21.680Z DEBUG reapers/endpoints.go:250 Found an endpoint missing labels. Updating with current job labels {"endpoint_id": 1500}
2023-08-01T10:05:21.705Z DEBUG reapers/endpoints.go:265 Finished reconciliation {"num_errors": 0}
2023-08-01T10:05:21.740Z DEBUG netreap/main.go:146 starting policy poller
2023-08-01T10:05:21.740Z INFO policy_poller policy/policy.go:41 starting Consul watch for key: netreap.io/policy
2023-08-01T10:05:21.746Z DEBUG reapers/endpoints.go:93 Got 2 job events. Handling...
2023-08-01T10:05:21.746Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T10:05:21.747Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T10:05:21.752Z INFO policy_poller policy/policy.go:98 loaded new policy
2023-08-01T10:05:21.753Z DEBUG reapers/endpoints.go:93 Got 2 job events. Handling...
2023-08-01T10:05:21.753Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T10:05:21.753Z DEBUG reapers/endpoints.go:104 Ignoring Job event with type of AllocationUpdated
2023-08-01T10:06:09.807Z DEBUG elector/mod.go:108 Unable to acquire lock. Retrying up to 6 times
2023-08-01T10:06:19.817Z DEBUG elector/mod.go:115 Lock retry 1 did not succeed
2023-08-01T10:06:29.831Z DEBUG elector/mod.go:115 Lock retry 2 did not succeed
2023-08-01T10:06:39.847Z DEBUG elector/mod.go:115 Lock retry 3 did not succeed
2023-08-01T10:06:49.879Z DEBUG elector/mod.go:115 Lock retry 4 did not succeed
2023-08-01T10:06:59.898Z DEBUG elector/mod.go:115 Lock retry 5 did not succeed
Maybe this is a bug or am I doing something wrong? In my cases, nomad jobs most often consist of several groups. Please take a look @deverton @protochron
Netreap - 0.1.2 also 0.1.0 Cilium - 1.13.4 Nomad - v1.5.6 Consul - v1.14.7
Before host restart:
After host restart:
Netreap dont be reapplying the labels after restart host.
Netreap debug logs:
In some cases, some jobs re-tagged, but not all:
In order for jobs to get tags again, and sometimes ip, you need to stop & start the job again:
Before restart job: After restart job: Netreap logs:
Cilium & Neatreap deployed from this guide https://cosmonic.com/blog/engineering/netreap-a-practical-guide-to-running-cilium-in-nomad. I think that this behavior of netreap is not entirely correct. Please tell me what is the reason for this behavior and how can I fix it? @deverton @protochron
Cilium - v1.13.4 Netreap - v0.1.0