Closed kwin closed 1 year ago
This needs further investigation as it seems at least sometimes lookup of authorizables by IDs is relying on Query internally. @otarsko Please provide further insights here.
Did local test on dummy data and got next results: With ~20k groups in AEM and ~3k groups affected by configuration got next results:
Top image - with cache, bottom - without (changes from this PR + #649
Looks like Query is used to extract the Authorizable, which leads to the slowness:
2 ideas (not very original though) I have:
@kwin wdyt?
The query being used for authorizable ids is the following: https://github.com/apache/jackrabbit-oak/blob/5a1a902e7a89fc44cb9e2f59b0c6939efa9c16e4/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/identifier/IdentifierManager.java#L342-L366. Usually that should be fast, but it is obviously not neglectable. The most important metric is how big the memory impact of removing the authorizable cache really is. @otarsko Do you have metrics for that as well?
If that is considerable we could just cache the path per authorizable id, as that lookup does not require a search.
@kwin, in the cloud we had nearly 4 GB in the cache with 3.0.4:
However locally, on the bigger amount of authorizables to process (17314 authorizables), I didn't manage to get even close to that amount with cache in place:
So, from one side - OOM in AEMaaCS was, most probably, caused by the cache. On the other side - It's not reproducible locally.
Kudos, SonarCloud Quality Gate passed!
@kwin latests changes look good to me:
with 17319 authorizables and without oak index:
Memory consumption is also OK (no visible changes to the previous test)
takes too much memory looking up authorizable by ID is really fast also cache empty memberships