coreos / fleet

fleet ties together systemd and etcd into a distributed init system
Apache License 2.0
2.42k stars 302 forks source link

Minimize Engine/AgentReconciler etcd requests #804

Open bcwaldon opened 10 years ago

bcwaldon commented 10 years ago

I have a cluster running 2 units. The engine makes the following requests to gather the relevant info for a single reconciliation:

Aug 22 17:27:29 core-01 fleet[1384]: I0822 17:27:29.002611 01384 reconciler.go:38] Polling Registry for actionable work
Aug 22 17:27:29 core-01 fleet[1384]: I0822 17:27:29.003835 01384 client.go:353] etcd: sending HTTP request GET http://localhost:4001/v2/keys/_coreos.com/fleet/job?consistent=true&recursive=true&sorted=true
Aug 22 17:27:29 core-01 fleet[1384]: I0822 17:27:29.004529 01384 client.go:360] etcd: recv response from GET http://localhost:4001/v2/keys/_coreos.com/fleet/job?consistent=true&recursive=true&sorted=true: 200 OK
Aug 22 17:27:29 core-01 fleet[1384]: I0822 17:27:29.005956 01384 client.go:353] etcd: sending HTTP request GET http://localhost:4001/v2/keys/_coreos.com/fleet/unit/e5a03e40306d918f3c0f6eb056a8e62acc1822d8?consistent=true&recursive=true&sorted=false
Aug 22 17:27:29 core-01 fleet[1384]: I0822 17:27:29.006835 01384 client.go:360] etcd: recv response from GET http://localhost:4001/v2/keys/_coreos.com/fleet/unit/e5a03e40306d918f3c0f6eb056a8e62acc1822d8?consistent=true&recursive=true&sorted=false: 200 OK
Aug 22 17:27:29 core-01 fleet[1384]: I0822 17:27:29.007362 01384 client.go:353] etcd: sending HTTP request GET http://localhost:4001/v2/keys/_coreos.com/fleet/unit/d9de011694dda111458a4fc9912ad629d86ad8af?consistent=true&recursive=true&sorted=false
Aug 22 17:27:29 core-01 fleet[1384]: I0822 17:27:29.008796 01384 client.go:360] etcd: recv response from GET http://localhost:4001/v2/keys/_coreos.com/fleet/unit/d9de011694dda111458a4fc9912ad629d86ad8af?consistent=true&recursive=true&sorted=false: 200 OK
Aug 22 17:27:29 core-01 fleet[1384]: I0822 17:27:29.009337 01384 client.go:353] etcd: sending HTTP request GET http://localhost:4001/v2/keys/_coreos.com/fleet/job?consistent=true&recursive=true&sorted=true
Aug 22 17:27:29 core-01 fleet[1384]: I0822 17:27:29.010080 01384 client.go:360] etcd: recv response from GET http://localhost:4001/v2/keys/_coreos.com/fleet/job?consistent=true&recursive=true&sorted=true: 200 OK
Aug 22 17:27:29 core-01 fleet[1384]: I0822 17:27:29.011552 01384 client.go:353] etcd: sending HTTP request GET http://localhost:4001/v2/keys/_coreos.com/fleet/state?consistent=true&recursive=true&sorted=false
Aug 22 17:27:29 core-01 fleet[1384]: I0822 17:27:29.012973 01384 client.go:360] etcd: recv response from GET http://localhost:4001/v2/keys/_coreos.com/fleet/state?consistent=true&recursive=true&sorted=false: 200 OK
Aug 22 17:27:29 core-01 fleet[1384]: I0822 17:27:29.013626 01384 client.go:353] etcd: sending HTTP request GET http://localhost:4001/v2/keys/_coreos.com/fleet/states?consistent=true&recursive=true&sorted=false
Aug 22 17:27:29 core-01 fleet[1384]: I0822 17:27:29.015100 01384 client.go:360] etcd: recv response from GET http://localhost:4001/v2/keys/_coreos.com/fleet/states?consistent=true&recursive=true&sorted=false: 200 OK
Aug 22 17:27:29 core-01 fleet[1384]: I0822 17:27:29.015385 01384 client.go:353] etcd: sending HTTP request GET http://localhost:4001/v2/keys/_coreos.com/fleet/machines?consistent=true&recursive=true&sorted=true
Aug 22 17:27:29 core-01 fleet[1384]: I0822 17:27:29.016772 01384 client.go:360] etcd: recv response from GET http://localhost:4001/v2/keys/_coreos.com/fleet/machines?consistent=true&recursive=true&sorted=true: 200 OK
Aug 22 17:27:29 core-01 fleet[1384]: I0822 17:27:29.017024 01384 engine.go:71] Engine completed reconciliation in 14.414077ms

The agent ends up making a similarly-inefficient set of calls:

Aug 22 17:41:06 core-01 fleet[1384]: I0822 17:41:06.000151 01384 reconcile.go:68] AgentReconciler tick
Aug 22 17:41:06 core-01 fleet[1384]: I0822 17:41:06.001583 01384 client.go:353] etcd: sending HTTP request GET http://localhost:4001/v2/keys/_coreos.com/fleet/job?consistent=true&recursive=true&sorted=true
Aug 22 17:41:06 core-01 fleet[1384]: I0822 17:41:06.009265 01384 client.go:360] etcd: recv response from GET http://localhost:4001/v2/keys/_coreos.com/fleet/job?consistent=true&recursive=true&sorted=true: 200 OK
Aug 22 17:41:06 core-01 fleet[1384]: I0822 17:41:06.010507 01384 client.go:353] etcd: sending HTTP request GET http://localhost:4001/v2/keys/_coreos.com/fleet/unit/e5a03e40306d918f3c0f6eb056a8e62acc1822d8?consistent=true&recursive=true&sorted=false
Aug 22 17:41:06 core-01 fleet[1384]: I0822 17:41:06.011013 01384 client.go:360] etcd: recv response from GET http://localhost:4001/v2/keys/_coreos.com/fleet/unit/e5a03e40306d918f3c0f6eb056a8e62acc1822d8?consistent=true&recursive=true&sorted=false: 200 OK
Aug 22 17:41:06 core-01 fleet[1384]: I0822 17:41:06.012216 01384 client.go:353] etcd: sending HTTP request GET http://localhost:4001/v2/keys/_coreos.com/fleet/unit/d9de011694dda111458a4fc9912ad629d86ad8af?consistent=true&recursive=true&sorted=false
Aug 22 17:41:06 core-01 fleet[1384]: I0822 17:41:06.015528 01384 client.go:360] etcd: recv response from GET http://localhost:4001/v2/keys/_coreos.com/fleet/unit/d9de011694dda111458a4fc9912ad629d86ad8af?consistent=true&recursive=true&sorted=false: 200 OK
Aug 22 17:41:06 core-01 fleet[1384]: I0822 17:41:06.018001 01384 client.go:353] etcd: sending HTTP request GET http://localhost:4001/v2/keys/_coreos.com/fleet/job?consistent=true&recursive=true&sorted=true
Aug 22 17:41:06 core-01 fleet[1384]: I0822 17:41:06.019370 01384 client.go:360] etcd: recv response from GET http://localhost:4001/v2/keys/_coreos.com/fleet/job?consistent=true&recursive=true&sorted=true: 200 OK
Aug 22 17:41:06 core-01 fleet[1384]: I0822 17:41:06.019657 01384 client.go:353] etcd: sending HTTP request GET http://localhost:4001/v2/keys/_coreos.com/fleet/state?consistent=true&recursive=true&sorted=false
Aug 22 17:41:06 core-01 fleet[1384]: I0822 17:41:06.022429 01384 client.go:360] etcd: recv response from GET http://localhost:4001/v2/keys/_coreos.com/fleet/state?consistent=true&recursive=true&sorted=false: 200 OK
Aug 22 17:41:06 core-01 fleet[1384]: I0822 17:41:06.022523 01384 client.go:353] etcd: sending HTTP request GET http://localhost:4001/v2/keys/_coreos.com/fleet/states?consistent=true&recursive=true&sorted=false
Aug 22 17:41:06 core-01 fleet[1384]: I0822 17:41:06.024085 01384 client.go:360] etcd: recv response from GET http://localhost:4001/v2/keys/_coreos.com/fleet/states?consistent=true&recursive=true&sorted=false: 200 OK
Aug 22 17:41:06 core-01 fleet[1384]: I0822 17:41:06.024225 01384 reconcile.go:265] Desired state "launched" matches current state of Job(foo.service), nothing to do
Aug 22 17:41:06 core-01 fleet[1384]: I0822 17:41:06.024233 01384 reconcile.go:265] Desired state "launched" matches current state of Job(ping.service), nothing to do
Aug 22 17:41:06 core-01 fleet[1384]: I0822 17:41:06.024240 01384 reconcile.go:44] AgentReconciler completed reconciliation in 22.798384ms

There are two problems that need fixing:

  1. fleet gets each unit file individually - get all unit files with a single GET /unit?recursive=true
  2. fleet calls GET /job?recursive=true twice - this should just be done once
denderello commented 9 years ago

We have discovered this too while investigating coreos/fleet/issues/1163. Fixing 1 and 2 should take pressure from etcd while 2 would also reduce the number of JSON unmarshal calls.

denderello commented 9 years ago

@bcwaldon are you planning to work on optimisations like this for the next releases?

bcwaldon commented 9 years ago

@denderello Not any time soon. Feel free to work on this if you have the time.

wuqixuan commented 9 years ago

+1