fabric8io / openshift-elasticsearch-plugin

Apache License 2.0
27 stars 21 forks source link

use searchguard's BackendRegistry userCache #173

Closed Gallardot closed 2 years ago

Gallardot commented 5 years ago

searchguard's BackendRegistry cache User Info by AuthCredentials Cache are defined as follows:

Cache<AuthCredentials, User> userCache;
Cache<AuthCredentials, User> authenticatedUserCacheTransport;

and the AuthCredentials's equals method as follows

@Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        AuthCredentials other = (AuthCredentials) obj;
        if (internalPasswordHash == null || other.internalPasswordHash == null || !MessageDigest.isEqual(internalPasswordHash, other.internalPasswordHash))
            return false;
        if (username == null) {
            if (other.username != null)
                return false;
        } else if (!username.equals(other.username))
            return false;
        return true;
    }

The internalPasswordHash is converted from password.If we want the cache to take effect, the password cannot be empty.so method extractCredentials in the OpenShiftTokenAuthentication should return a AuthCredentials with a password.But we do not really have “the password”, i think use “openshift” as the password is a choice.

I build a patch with attachfile. openshift-elasticsearch-plugin.txt

Gallardot commented 5 years ago

If we don't do this, the user will alaways miss cached.

[2019-04-11T06:49:03,223][DEBUG][c.f.s.a.BackendRegistry  ] User 'tanhl' is in cache? false (cache size: 2)
[2019-04-11T06:49:03,224][DEBUG][c.f.s.a.BackendRegistry  ] tanhl not cached, return from io.fabric8.elasticsearch.plugin.auth.OpenShiftTokenAuthentication backend directly
jcantrill commented 5 years ago

It is unlikely we can depend on SG caching as it relies on usernames to represent users and permissions. We do some trickery to perform logic with a request, defer to SG, and then additional logic after the request. User's are represented as tokens in Openshift which must be 'converted' to a username before alllowing SG to evaluate a user. We can only guarantee token maps to a given username if we do evaluation before SG and this is where we could insert a cache.

Gallardot commented 5 years ago

@jcantrill

The ONLY reason ACLs are synced everytime is because we removed caching because of inconsistent behavior reported by users when we did have caching.

Thank you for solving my problem.

But we still want to fix the synchronous ACL problem. Reduce time consumption.

I have a idea.Because synchronizing acl to elasticsearch takes a lot of time.On the elasticsearch's master node, start a timed task to synchronize all users' acls to elasticsearch. In this task, we need to deal with three things:

  1. add new user’s ACL,ROLE map,project ect.
  2. update old user’s ACL,role map,project ect.
  3. delete no exist user.

So that we do not need to sync ACL on every request.

Any Suggestions? Looking forward to your reply.Thank you.

jcantrill commented 5 years ago

But we still want to fix the synchronous ACL problem. Reduce time consumption.

I have a idea.Because synchronizing acl to elasticsearch takes a lot of time.On the elasticsearch's master node, start a timed task to synchronize all users' acls to elasticsearch. In this task, we need to deal with three things:

  1. add new user’s ACL,ROLE map,project ect.
  2. update old user’s ACL,role map,project ect.
  3. delete no exist user.

So that we do not need to sync ACL on every request.

Any Suggestions? Looking forward to your reply.Thank you.

You would need to look at the SG logic to determine from where it evaluates its permissions; I don't recall off hand. I believe it depends upon an in-memory representation of what is in store that is updated when it receives a callback notification. I believe the only way to update it's in-memory reference is by storing and notifying.

SG relies upon the user that is in context to evaluate permissions. If that user is not in its permissions then it will fail, which the dynamic nature by which we seed causes challenges. The original cache implementation did not work on time; it checked cache, upon miss, grabbed the details and then synced the ACL. I think this is still a viable solution but it would mean initial requests are still slow. You may still have that issue with the timed sync. Maybe the mitigation is to either extend the expiration period or only flush the cache when an update is needed. Only update when the user info changes and you are past the expiration period.

Gallardot commented 5 years ago

@jcantrill Thank you for your reply!

I looked at the SG logic. es跨机房

A picture is worth a thousand words.

As is shown in the picture.After plugin sync ACL to ES, but before SG query ACL from ES. the timed task on ES Master (the plugin) may delete expire ACL.There is a certain probability that this will happen. Is that right?

jcantrill commented 5 years ago

As is shown in the picture.After plugin sync ACL to ES, but before SG query ACL from ES. the timed task on ES Master (the plugin) may delete expire ACL.There is a certain probability that this will happen. Is that right?

Note this [1] is where I mentioned we hook into the flow to process pre-SG and post-SG; I'm not sure your picture accurately reflects this logic but it's important to understand. In regards to expiration, there is a possibility the ACL will expire but it depends on the timestamp for a specific role and mapping block. I would not expect the permissions for a given user during a given request to expire while the request is in flight unless we do not additionally bump the expiration time to the next cycle; I don't recall exactly what is done.

[1] https://github.com/fabric8io/openshift-elasticsearch-plugin/blob/master/src/main/java/io/fabric8/elasticsearch/plugin/acl/DynamicACLFilter.java#L82

richm commented 5 years ago

@Gallardot - there is a PR to reimplement caching from @jcantrill - https://github.com/fabric8io/openshift-elasticsearch-plugin/pull/179 - would you be able to review/test that PR to see if it solves your problem?

Gallardot commented 5 years ago

@richm OK, I will do it at this weekend.