collective / pas.plugins.ldap

Zope (and Plone) PAS Plugin providing users and groups from LDAP directory
http://pypi.python.org/pypi/pas.plugins.ldap
Other
13 stars 20 forks source link

Should pas.plugins.ldap really load all LDAP user "keys" for every request? #4

Closed datakurre closed 8 years ago

datakurre commented 10 years ago

Hi, am I missing something here?

All the plugin methods seem to invoke self.users, which is only cached per request, and ends up initializing something called Ugm, e.g.:

https://github.com/collective/pas.plugins.ldap/blob/master/src/pas/plugins/ldap/plugin.py#L185

Initializing Ugm-object eventually calls context._load_keys in

https://github.com/bluedynamics/node.ext.ldap/blob/master/src/node/ext/ldap/ugm/_api.py#L442

and that triggers non-paged search for all the available users at

https://github.com/bluedynamics/node.ext.ldap/blob/master/src/node/ext/ldap/_node.py#L599

And once I paged that search, it took about 15+ seconds for our 90 000 principals. Per request.

What am I missing? Should context._load_keys really load all the available users, or have we just managed to misconfigure the plugin somehow?

Thanks!

rnixx commented 10 years ago

@jensens - are all the results cached on request? you mentioned that caching is sane?

if results are cached at request only, we should add possibility to cache in ram, or memcached, or elsewhere less volatile than on request, this also would make it possible to use LDAP search paging

jensens commented 10 years ago

Well, at PAS-level its only cached on request, invoking ugm each time. But on node.ext.ldap level the ldap results of the ldap queries are cached i.e. using memcached if configured.

I know here is much room for optimizations. The major problem is not caching (thats easy) but an active cache-invalidation (passive i.e timeouts are again simple). I'd like to have invalidation pluggable - so if you dont use it it not in your way.

Another issue/enhancement in this context are large ugm-trees. At the moment we have no feature to load them partial. I'am not sure if we want to have 90k users at once in RAM!

Any ideas/ implementations are welcome!

datakurre commented 10 years ago

@jensens I tried also with memcached and saw it being filled with LDAP-entries, but it didn't prevent _load_keys-being called once for each request. Should memcached prevent it (and I'm still doing something wrong)? For me it seemed like only queries for single users are being memcached.

The old LDAPUserFolder does not query all the users, which makes it usable also with larger ugm-trees (of course, user searches can still be quite heavy with it).

I think, it would be ok to store any amount of queries/users in memcached as long as it's possible to share the cache between sites.

I'm not yet familiar enough with node.ext.ldap to know, how deeply it requires to know all users before doing anything and what should be fixed to make it work more lazily and with larger trees.

jensens commented 10 years ago

Yes, also the keys should be cached. But i'am not sure if this is the solution. I'd say if a single user is requested its enough to load this one user and dont first ask for all keys. I also have to take some time to get back into the implementation details. Also caching in ugm tree should be done for the single user and not for the whole tree. @rnix and I talked about this in the past, but never implemented it.

rnix commented 10 years ago

@jensens @rnixx dos equis :)

jensens commented 10 years ago

@rnix and @rnixx sorry for confusion :D

chaoflow commented 10 years ago

@jensens, @rnixx, @datakurre any new insights on this?

rnixx commented 9 years ago

hannes just did some work on caching the UGM tree in RAM instead of building it for each request, which speeds up the whole story a lot. Nevertheless it does not solve the underlying problem of loading all keys for a search base in node.ext.ldap. This refactoring still misses a project/budget

thet commented 9 years ago

users/groups caching implemented here: https://github.com/collective/pas.plugins.ldap/pull/13 still, the first call is expensive

tdesvenain commented 9 years ago

Lol) NN Le 31 janv. 2014 15:16, "Asko Soukka" notifications@github.com a écrit :

Hi, am I missing something here?

All the plugin methods seem to invoke self.users, which is only cached per request, and ends up initializing something called Ugm, e.g.:

https://github.com/collective/pas.plugins.ldap/blob/master/src/pas/plugins/ldap/plugin.py#L185

Initializing Ugm-object eventually calls context._load_keys in

https://github.com/bluedynamics/node.ext.ldap/blob/master/src/node/ext/ldap/ugm/_api.py#L442

and that triggers non-paged search for all the available users at

https://github.com/bluedynamics/node.ext.ldap/blob/master/src/node/ext/ldap/_node.py#L599

And once I paged that search, it took about 15+ seconds for our 90 000 principals. Per request.

What am I missing? Should context._load_keys really load all the available users, or have we just managed to misconfigure the plugin somehow?

Thanks!

Reply to this email directly or view it on GitHub https://github.com/collective/pas.plugins.ldap/issues/4.

rnixx commented 9 years ago

@tdesvenain hm?

rnixx commented 9 years ago

The underlying problem of loading all keys is basically solved in https://github.com/bluedynamics/node.ext.ldap/tree/performance. It's not completely finished yet, but some might have time and motivation creating also a performance branch of p.p.ldap and start playing around with the new node.ext.ldap branch.

cheers

rnixx commented 9 years ago

^^ @chaoflow @pilz @thet @datakurre @tdesvenain @saily

pilz commented 9 years ago

@rnixx thanks, very neat!! Will soon look into that!

jensens commented 9 years ago

i already created a performance branch and at a first look all works using it. i did not checked user listing related features, i'am sure there is some work left. but overall it works much better, even w/o any caching its fast.

rnixx commented 8 years ago

node.ext.ldap 1.0b1 is now released which solves this issue.