Netcentric / accesscontroltool

Rights and roles management for AEM made easy
Eclipse Public License 1.0
147 stars 92 forks source link

ACTool-Config-Worker may throw an exception because it executes very expensive queries, preventing startup #669

Open thomasmueller opened 1 year ago

thomasmueller commented 1 year ago

The following two queries are executed at startup (thread name "Apache Sling Repository Startup Thread #1-ACTool-Config-Worker"). Depending on the content, they may try to read more than 100'000 nodes, which throws an exception, and so startup fails.

SELECT ace.* FROM [rep:ACE] AS ace 
WHERE ace.[rep:principalName] IS NOT NULL 
AND ISDESCENDANTNODE(ace, [/content])

SELECT ace.* FROM [rep:ACE] AS ace 
WHERE ace.[rep:principalName] IS NOT NULL 
AND ISDESCENDANTNODE(ace, [/apps])

Specially the first query may read too many entries. Both queries use the index /oak:index/acPrincipalName.

I think this is somewhat related to https://github.com/Netcentric/accesscontroltool/issues/219 - however switching to a Lucene index won't resolve the issue.

ghenzler commented 1 year ago

@thomasmueller Is the problem happening during image build or during startup of k8s pod?

Overall the following should be true:

You can check the the following code to see what is happening here: https://github.com/Netcentric/accesscontroltool/blob/43e00e4c01c8b3c0db9b3cd5a5ed4371ef9128de/accesscontroltool-startuphook-bundle/src/main/java/biz/netcentric/cq/tools/actool/startuphook/impl/AcToolStartupHookServiceImpl.java#L68 and you can check for the respective log messages in your setup.

thomasmueller commented 1 year ago

Hi,

Thanks a lot! The problem I see, it is run at AEM startup of k8s (author), against the whole repository (including /content). This is in the Repository Startup thread; it is blocking the startup.

It normally does this [async]

Great! So maybe the case I saw had a non-default configuration! How can this be configured? It then might just be a matter of explaining this; possibly improving the documentation.

kwin commented 5 months ago

It then might just be a matter of explaining this; possibly improving the documentation.

This OSGi configuration is not documented at all, so hopefully no one deviates from the default without knowing exactly what to do here: https://github.com/Netcentric/accesscontroltool/blob/039f56af944a0814230b17105bea6e7b6fd51bf6/accesscontroltool-startuphook-bundle/src/main/java/biz/netcentric/cq/tools/actool/startuphook/impl/AcToolStartupHookServiceImpl.java#L56

kwin commented 5 months ago

The issue seems to be the evolution of the AEMaaCS build pipeline as outlined in https://adapt.to/2023/schedule/evolution-of-the-aemaacs-build-pipeline.

  1. The build image step does no longer involve custom code
  2. During the deploy step the composite node store seems to be used in seed mode (i.e. the method https://github.com/Netcentric/accesscontroltool/blob/039f56af944a0814230b17105bea6e7b6fd51bf6/accesscontroltool-bundle/src/main/java/biz/netcentric/cq/tools/actool/helper/runtime/RuntimeHelper.java#L20 will never return true in AEMaaCS.