Open Vinceveve opened 9 years ago
I have same pb and when i put fleet in debug mode i have : https://gist.github.com/tclavier/3aa1cb84a229aedb53ab
And when i put value into key space with etcdctl that work fine.
I was getting a lot of the same messages, increasing the etcd_request_timeout made them go away for me.
@tclavier is it a timeout issue? Are you able to reproduce the error if you raise the etcd_request_timeout in both fleetctl and fleetd?
For last test i use
Errors are same :
fleet server unable to communicate with etcd
Looks like I've got the same issue on my side. I keep on getting INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
messages, and at some point Fleet randomly restarts units:
06:38:48 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:38:48 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:38:48 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:38:48 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:39:45 host fleetd[23463]: WARN job.go:272: No Unit found in Registry for Job(consul-registrator.service)
06:39:45 host fleetd[23463]: ERROR job.go:109: Failed to parse Unit from etcd: unable to parse Unit in Registry at key /_coreos.com/fleet/job/consul-registrator.service/object
06:39:45 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:39:46 host fleetd[23463]: INFO manager.go:138: Triggered systemd unit consul-registrator.service stop: job=11845869
06:39:46 host fleetd[23463]: INFO manager.go:259: Removing systemd unit consul-registrator.service
06:39:47 host fleetd[23463]: INFO manager.go:182: Instructing systemd to reload units
06:39:52 host fleetd[23463]: INFO reconcile.go:330: AgentReconciler completed task: type=UnloadUnit job=consul-registrator.service reason="unit loaded but not scheduled here"
06:39:52 host fleetd[23463]: INFO reconcile.go:330: AgentReconciler completed task: type=ReloadUnitFiles job=N/A reason="always reload unit files"
06:39:52 host fleetd[23463]: WARN reconcile.go:59: AgentReconciler completed reconciliation in 8.692721762s
06:39:52 host fleetd[23463]: INFO manager.go:246: Writing systemd unit consul-registrator.service (618b)
06:39:52 host fleetd[23463]: INFO manager.go:182: Instructing systemd to reload units
06:39:55 host fleetd[23463]: INFO manager.go:127: Triggered systemd unit consul-registrator.service start: job=11846043
06:39:55 host fleetd[23463]: INFO reconcile.go:330: AgentReconciler completed task: type=LoadUnit job=consul-registrator.service reason="unit scheduled here but not loaded"
06:39:55 host fleetd[23463]: INFO reconcile.go:330: AgentReconciler completed task: type=ReloadUnitFiles job=N/A reason="always reload unit files"
06:39:55 host fleetd[23463]: INFO reconcile.go:330: AgentReconciler completed task: type=StartUnit job=consul-registrator.service reason="unit currently loaded but desired state is launched"
06:40:03 host fleetd[23463]: WARN reconcile.go:59: AgentReconciler completed reconciliation in 6.414672939s
06:40:30 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:30 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:30 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:30 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:45 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:45 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
06:40:45 host fleetd[23463]: INFO client.go:292: Failed getting response from http://localhost:2379/: cancelled
Running:
Timeouts for etcd2:
And for fleet:
Happens on just one out of three machines in the cluster.
Same here.
Sep 03 14:49:06 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:49:06 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:50:51 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:50:51 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:50:51 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:51:21 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:51:21 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Sep 03 14:51:21 coreos-web-7 fleetd[601]: INFO client.go:292: Failed getting response from http://localhost:4001/: cancelled
Also excessive CPU and RAM usage. CPU goes from 10%to40% and ram is around 15% out of 2GB
I got rid of the first error message of the original poster by fixing the dependencies in my unit files: I had made them dependent on etcd although etcd2 is running.
I get the second error message, for example,
2015/09/05 13:07:45 WARN fleetctl.go:797: Error retrieving Unit(memcached.service) from Registry: Get http://domain-sock/fleet/v1/units/memcached.service?alt=json: forwarding request denied
only if I try to start a new unit using fleetctl through a tunnel, with the unit file residing on the host system. Executing it within the VM that's running CoreOS is fine.
I have the same issue as @bronger on Vagrant... Trying to load services via --tunnel
leads to infinite hanging with these warnings :
Error retrieving Unit(es-data@1.service) from Registry: Get http://domain-sock/fleet/v1/units/es-data%401.service?alt=json: forwarding request denied
This is how I start up Vagrant : https://github.com/l15k4/coreos-es/blob/master/bin/start-vagrant.sh#L47
And then I'm just loading and starting services :
https://github.com/l15k4/coreos-es/blob/master/bin/start-vagrant.sh#L91
Using fleetctl --debug=true --strict-host-key-checking=false
.
@bronger any update as to why you got that error? I'm getting a similar one too.
$ fleetctl start app-dashboard@{1..4}
2016/01/21 10:29:12 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@1.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%401.service?alt=json: forwarding request denied
2016/01/21 10:29:13 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@2.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%402.service?alt=json: forwarding request denied
2016/01/21 10:29:13 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@1.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%401.service?alt=json: forwarding request denied
2016/01/21 10:29:13 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@3.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%403.service?alt=json: forwarding request denied
Unit app-dashboard@4.service inactive
Unit app-dashboard@2.service inactive
Unit app-dashboard@1.service inactive
Unit app-dashboard@3.service inactive
2016/01/21 10:29:14 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@2.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%402.service?alt=json: forwarding request denied
2016/01/21 10:29:14 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@3.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%403.service?alt=json: forwarding request denied
2016/01/21 10:29:14 WARN fleetctl.go:799: Error retrieving Unit(app-dashboard@4.service) from Registry: Get http://domain-sock/fleet/v1/units/app-dashboard%404.service?alt=json: forwarding request denied
$ fleetctl -version
fleetctl version 0.11.5
I abandoned fleet for my workflow, so I cannot help on this.
@bweston92 We handled this by changing the timeout settings. See issue 1289 and specifically this comment: https://github.com/coreos/fleet/issues/1289#issuecomment-146424680
7 machines cluster, 3 mains etcd, 4 etcd proxies, all running on 4001 and 2379
On all machine I'm getting this
Fleet agent is working : reconcile, schedule ...
But fleetctl on many requests send :
OR
Etcd is fine on all machines and answers fine when I query it during the request denied.
Any idea ?