Why ACLDocumentManager need to sync the ACL to ElasticSearch on every search request?

Gallardot commented 5 years ago

Version:

OpenShift Master:v3.11.0+cca69c5-73
Kubernetes Master:v1.11.0+d4cacc0
Images tag: v3.11.0

In our scenario:

The index scenario: We only use fluentd to collect logs. Fluentd will write log into the ElasticSearch with certificate. I think this is not going through openshift-elasticsearch-plugin . Is that right？
The query scenario: The user accesses ElasticSearch through Kibana.I think this is going through openshift-elasticsearch-plugin.This plugin synchronizes the ACL from OpenShift to ElasticSearch on every request so that searchguard can use it.Is that right?

If I’m right,my question is why we need sync acl on every request?By observing ElasticSearch's log, we can see that the synchronization operation takes a lot of time.

some log:

[2019-04-11T06:49:03,045][DEBUG][i.f.e.p.OpenshiftRequestContextFactory] Handling Request... /_msearch
[2019-04-11T06:49:03,045][DEBUG][i.f.e.p.OpenshiftRequestContextFactory] Evaluating request for user '' with a non-empty token
[2019-04-11T06:49:03,107][DEBUG][i.f.e.p.OpenshiftAPIService] Response: code '200' {"kind":"User","apiVersion":"user.openshift.io/v1","metadata":{"name":"tanhl","selfLink":"/apis/user.openshift.io/v1/users/tanhl","uid":"9f6e716f-5202-11e9-ae33-005056b07597","resourceVersion":"20046255","creationTimestamp":"2019-03-29T09:11:22Z"},"identities":["htpasswd_auth:tanhl"],"groups":["system:authenticated","system:authenticated:oauth"]}

[2019-04-11T06:49:03,112][DEBUG][i.f.e.p.OpenshiftAPIService] Request: Request{method=POST, url=https://172.31.0.1/apis/authorization.openshift.io/v1/subjectaccessreviews, tag=null}
[2019-04-11T06:49:03,113][DEBUG][i.f.e.p.OpenshiftAPIService] Request body: {"kind":"SubjectAccessReview","apiVersion":"authorization.openshift.io/v1","verb":"view","scopes":[],"resourceAPIGroup":"","resource":"pods/log","namespace":"default"}
[2019-04-11T06:49:03,164][DEBUG][i.f.e.p.OpenshiftAPIService] Response: Response{protocol=http/1.1, code=201, message=Created, url=https://172.31.0.1/apis/authorization.openshift.io/v1/subjectaccessreviews}
[2019-04-11T06:49:03,164][DEBUG][i.f.e.p.OpenshiftAPIService] Response body: {"kind":"SubjectAccessReviewResponse","apiVersion":"authorization.openshift.io/v1","namespace":"default","allowed":true,"reason":"RBAC: allowed by ClusterRoleBinding \"cluster-admin-1\" of ClusterRole \"cluster-admin\" to User \"tanhl\""}

[2019-04-11T06:49:03,164][DEBUG][i.f.e.u.RequestUtils     ] User '<UNKNOWN>' isOperationsUser: true
[2019-04-11T06:49:03,223][DEBUG][c.f.s.a.BackendRegistry  ] User 'tanhl' is in cache? false (cache size: 2)
[2019-04-11T06:49:03,224][DEBUG][c.f.s.a.BackendRegistry  ] tanhl not cached, return from io.fabric8.elasticsearch.plugin.auth.OpenShiftTokenAuthentication backend directly
[2019-04-11T06:49:03,229][DEBUG][i.f.e.p.OpenshiftAPIService] Request: Request{method=POST, url=https://172.31.0.1/apis/authorization.openshift.io/v1/subjectaccessreviews, tag=null}
[2019-04-11T06:49:03,229][DEBUG][i.f.e.p.OpenshiftAPIService] Request body: {"kind":"SubjectAccessReview","apiVersion":"authorization.openshift.io/v1","verb":"view","scopes":[],"resourceAPIGroup":"metrics.openshift.io","resource":"prometheus","namespace":"openshift-logging"}
[2019-04-11T06:49:03,280][DEBUG][i.f.e.p.OpenshiftAPIService] Response: Response{protocol=http/1.1, code=201, message=Created, url=https://172.31.0.1/apis/authorization.openshift.io/v1/subjectaccessreviews}
[2019-04-11T06:49:03,280][DEBUG][i.f.e.p.OpenshiftAPIService] Response body: {"kind":"SubjectAccessReviewResponse","apiVersion":"authorization.openshift.io/v1","namespace":"openshift-logging","allowed":true,"reason":"RBAC: allowed by ClusterRoleBinding \"cluster-admin-1\" of ClusterRole \"cluster-admin\" to User \"tanhl\""}

[2019-04-11T06:49:03,280][DEBUG][c.f.s.a.BackendRegistry  ] User 'User [name=tanhl, roles=[gen_user_b489efc3dbc1875886a37de5698d05b96fc08de7, gen_kibana_b489efc3dbc1875886a37de5698d05b96fc08de7, gen_project_operations, prometheus]]' is authenticated
[2019-04-11T06:49:03,280][DEBUG][c.f.s.a.BackendRegistry  ] sg_tenant 'null'
[2019-04-11T06:49:03,281][DEBUG][i.f.e.u.RequestUtils     ] Modifying header 'X-Proxy-Remote-User' to be 'tanhl'
[2019-04-11T06:49:03,281][DEBUG][i.f.e.u.RequestUtils     ] Handling Request... /_msearch

Authentication is ok. took 236ms

[2019-04-11T06:49:03,281][DEBUG][i.f.e.u.RequestUtils     ] Evaluating request for user 'tanhl' with a non-empty token
[2019-04-11T06:49:03,282][DEBUG][i.f.e.p.k.KibanaSeed     ] Begin setDashboards:  projectPrefix 'project' for user 'tanhl' projects '[Project [name=linzm, uid=5e7301b1-05ad-11e9-8d34-005056b07597], Project [name=istio-system, uid=1bfa1e1b-3b3f-11e9-8d34-005056b07597], Project [name=openshift-monitoring, uid=2a1db17d-02a0-11e9-8d34-005056b07597], Project [name=kube-public, uid=f77f69ca-029c-11e9-94c2-005056b07597], Project [name=openshift-logging, uid=77db1911-02d3-11e9-8d34-005056b07597], Project [name=openshift-metrics-server, uid=0fd0c00c-02bd-11e9-8d34-005056b07597], Project [name=openshift-sdn, uid=d4d7c723-029f-11e9-8d34-005056b07597], Project [name=openshift-node, uid=f8623920-029c-11e9-94c2-005056b07597], Project [name=openshift-template-service-broker, uid=d1730cc7-02d4-11e9-8d34-005056b07597], Project [name=kube-service-catalog, uid=8d7b4454-02d4-11e9-8d34-005056b07597], Project [name=operator-lifecycle-manager, uid=5fef6ea0-39b1-11e9-8d34-005056b07597], Project [name=logan, uid=e1f038a8-4858-11e9-8d34-005056b07597], Project [name=openshift-node-problem-detector, uid=6b384539-3a6b-11e9-8d34-005056b07597], Project[name=openshift-console, uid=8f44bc17-3a6d-11e9-8d34-005056b07597], Project [name=istio-operator, uid=51c62e9d-3b36-11e9-8d34-005056b07597], Project [name=management-infra, uid=e1c3f5a8-029f-11e9-8d34-005056b07597], Project [name=openshift-web-console, uid=b53a0995-02bc-11e9-8d34-005056b07597]]' kibanaIndex '.kibana.b489efc3dbc1875886a37de5698d05b96fc08de7'
[2019-04-11T06:49:03,282][DEBUG][i.f.e.p.k.KibanaSeed     ] Kibana index '.kibana.b489efc3dbc1875886a37de5698d05b96fc08de7' exists? true
[2019-04-11T06:49:03,288][DEBUG][i.f.e.p.a.ACLDocumentManager] Syncing the ACL to ElasticSearch
[2019-04-11T06:49:03,288][DEBUG][i.f.e.p.a.ACLDocumentManager] Loading SearchGuard ACL...waiting up to 30s
[2019-04-11T06:49:03,289][DEBUG][i.f.e.p.a.ConfigurationLoader] Received config for roles (of [roles, rolesmapping]) with current latch value=1
[2019-04-11T06:49:03,289][DEBUG][i.f.e.p.a.ConfigurationLoader] Received config for rolesmapping (of [roles, rolesmapping]) with current latch value=0
[2019-04-11T06:49:03,290][DEBUG][i.f.e.p.a.ACLDocumentManager] Read in rolesmapping: {
  "gen_kibana_b489efc3dbc1875886a37de5698d05b96fc08de7" : {
    "expires" : "1554965402004",
    "users" : [
      "tanhl"
    ]
  },
  "gen_project_operations" : {
    "expires" : "1554965402004",
    "users" : [
      "tanhl"
    ]
  },
  "sg_role_rsyslog" : {

..........................................................omit  logs
..........................................................omit  logs
..........................................................omit  logs

[2019-04-11T06:49:03,342][DEBUG][c.f.s.a.BackendRegistry  ] Try to extract auth creds from http io.fabric8.elasticsearch.plugin.auth.OpenShiftTokenAuthentication
[2019-04-11T06:49:03,343][DEBUG][i.f.e.p.OpenshiftRequestContextFactory] Handling Request... /
[2019-04-11T06:49:03,343][DEBUG][i.f.e.p.OpenshiftRequestContextFactory] Evaluating request for user '' with a empty token
[2019-04-11T06:49:03,343][DEBUG][i.f.e.p.OpenshiftRequestContextFactory] Returning EMPTY request context; either was provided client cert or empty token.
[2019-04-11T06:49:03,343][DEBUG][c.f.s.a.BackendRegistry  ] Try to extract auth creds from http clientcert
[2019-04-11T06:49:03,344][DEBUG][c.f.s.a.BackendRegistry  ] User 'CN=system.logging.kibana,OU=OpenShift,O=Logging' is in cache? false (cache size: 3)
[2019-04-11T06:49:03,343][DEBUG][i.f.e.p.a.ACLDocumentManager] Read in roles: {
  "gen_kibana_b489efc3dbc1875886a37de5698d05b96fc08de7" : {

..........................................................omit  logs
..........................................................omit  logs
..........................................................omit  logs

  }
}
[2019-04-11T06:49:03,344][DEBUG][c.f.s.a.BackendRegistry  ] CN=system.logging.kibana,OU=OpenShift,O=Logging not cached, return from noop backend directly
[2019-04-11T06:49:03,344][DEBUG][c.f.s.a.BackendRegistry  ] User 'User [name=CN=system.logging.kibana,OU=OpenShift,O=Logging, roles=[]]' is authenticated
[2019-04-11T06:49:03,344][DEBUG][c.f.s.a.BackendRegistry  ] sg_tenant 'null'
[2019-04-11T06:49:03,344][DEBUG][i.f.e.p.a.ACLDocumentManager] Syncing from context to ACL...
[2019-04-11T06:49:03,344][DEBUG][i.f.e.p.a.ACLDocumentManager] Updating rolesmapping to: {
  "gen_kibana_b489efc3dbc1875886a37de56

..........................................................omit  logs
..........................................................omit  logs
..........................................................omit  logs

[2019-04-11T06:49:03,781][DEBUG][c.f.s.c.ConfigurationLoader] Received config for config (of [config, roles, rolesmapping, actiongroups, internalusers]) with current latch value=4
[2019-04-11T06:49:03,781][DEBUG][c.f.s.c.ConfigurationLoader] Received config for roles (of [config, roles, rolesmapping, actiongroups, internalusers]) with current latch value=3
[2019-04-11T06:49:03,781][DEBUG][c.f.s.c.ConfigurationLoader] Received config for rolesmapping (of [config, roles, rolesmapping, actiongroups, internalusers]) with current latch value=2
[2019-04-11T06:49:03,782][DEBUG][c.f.s.c.ConfigurationLoader] Received config for actiongroups (of [config, roles, rolesmapping, actiongroups, internalusers]) with current latch value=1
[2019-04-11T06:49:03,782][DEBUG][c.f.s.c.ConfigurationLoader] Received config for internalusers (of [config, roles, rolesmapping, actiongroups, internalusers]) with current latch value=0
[2019-04-11T06:49:03,782][DEBUG][c.f.s.c.IndexBaseConfigurationRepository] Notify com.floragunn.searchguard.http.XFFResolver@c1050f2 listener about change configuration with type config
[2019-04-11T06:49:03,782][DEBUG][c.f.s.c.IndexBaseConfigurationRepository] Notify com.floragunn.searchguard.auth.BackendRegistry@1e3df614 listener about change configuration with type config
[2019-04-11T06:49:03,783][DEBUG][c.f.s.a.BackendRegistry  ] Clear user cache for CN=system.logging.kibana,OU=OpenShift,O=Logging due to EXPLICIT
[2019-04-11T06:49:03,783][DEBUG][c.f.s.a.BackendRegistry  ] Clear user cache for tanhl due to EXPLICIT
[2019-04-11T06:49:03,783][DEBUG][c.f.s.a.BackendRegistry  ] Clear user cache for CN=system.logging.fluentd,OU=OpenShift,O=Logging due to EXPLICIT
[2019-04-11T06:49:03,783][DEBUG][c.f.s.a.BackendRegistry  ] Clear user cache for CN=system.logging.fluentd,OU=OpenShift,O=Logging due to EXPLICIT
[2019-04-11T06:49:03,783][DEBUG][i.f.e.p.a.ACLDocumentManager] Successfully reloaded config with '1' nodes

Sync ACL. took 502 ms.

But we takes only 200 milliseconds to access Elasticsearch directly.Therefore, in our scenario we want to reduce the operation of sync ACL.We need some help.

jcantrill commented 5 years ago

@Gallardot already answered your question: https://github.com/fabric8io/openshift-elasticsearch-plugin/issues/172#issuecomment-481805127 This is the only reason

jcantrill commented 5 years ago

Any request that includes a bearer token performs, as you noted, three requests(possibly more) against the API server. Fluent utilizes certificates; the request does not send a token so these requests are not made. The ONLY reason ACLs are synced everytime is because we removed caching because of inconsistent behavior reported by users when we did have caching. It's likely it worked as expected, but manifested it self differently to users and it was user perception issues.

There is no philosophical or technical reason to not add a cache; its simply the time required for someone to do the work.

fabric8io / openshift-elasticsearch-plugin

Why ACLDocumentManager need to sync the ACL to ElasticSearch on every search request? #174