allegro / marathon-consul

Integrates Marathon apps with Consul service discovery.
Apache License 2.0
191 stars 33 forks source link

Catalog get operation randomness problematic in heterogenous ACL environments #276

Closed chemicL closed 6 years ago

chemicL commented 6 years ago

Currently, the read operation for list of services retries any cached agent in the cluster (obtained via local agent query for nodes).

The issue with that is in non-homogenous environments, where some agents are read-only and have no ACL configuration attached, and are handled by anonymous token.

When marathon-consul is equipped with a token, it can send it to a read-only agent, which will fail the request not knowing where to forward the ACL Token bound query.

Despite having a local agent defined (consul-local-agent-host), all read requests still go to a random node in the first place, which should be avoided, too.

Steps to reproduce:

  1. Setup consul server cluster with ACL policies enabled (default policy allow),
  2. Generate token for read operations for services,
  3. Setup one consul client without ACL-DC configured,
  4. Setup one consul client with ACL-DC pointing to the server cluster from step 1,
  5. Configure marathon-consul on another node with a local consul client agent (it doesn't matter whether it's client has ACL-DC configured) and consul-token with the value from step 2.

Running marathon-consul in this setup should at some point yield logs:

... "error":"Unexpected response code: 403 (rpc error: rpc error: ACL not found)","level":"error","msg":"An error occurred getting services from Consul, retrying with another agent" ...
janisz commented 6 years ago

Fixed by #277