Open thomasmueller opened 1 year ago
@thomasmueller Is the problem happening during image build or during startup of k8s pod?
Overall the following should be true:
You can check the the following code to see what is happening here: https://github.com/Netcentric/accesscontroltool/blob/43e00e4c01c8b3c0db9b3cd5a5ed4371ef9128de/accesscontroltool-startuphook-bundle/src/main/java/biz/netcentric/cq/tools/actool/startuphook/impl/AcToolStartupHookServiceImpl.java#L68 and you can check for the respective log messages in your setup.
Hi,
Thanks a lot! The problem I see, it is run at AEM startup of k8s (author), against the whole repository (including /content). This is in the Repository Startup thread; it is blocking the startup.
It normally does this [async]
Great! So maybe the case I saw had a non-default configuration! How can this be configured? It then might just be a matter of explaining this; possibly improving the documentation.
It then might just be a matter of explaining this; possibly improving the documentation.
This OSGi configuration is not documented at all, so hopefully no one deviates from the default without knowing exactly what to do here: https://github.com/Netcentric/accesscontroltool/blob/039f56af944a0814230b17105bea6e7b6fd51bf6/accesscontroltool-startuphook-bundle/src/main/java/biz/netcentric/cq/tools/actool/startuphook/impl/AcToolStartupHookServiceImpl.java#L56
The issue seems to be the evolution of the AEMaaCS build pipeline as outlined in https://adapt.to/2023/schedule/evolution-of-the-aemaacs-build-pipeline.
seed
mode (i.e. the method https://github.com/Netcentric/accesscontroltool/blob/039f56af944a0814230b17105bea6e7b6fd51bf6/accesscontroltool-bundle/src/main/java/biz/netcentric/cq/tools/actool/helper/runtime/RuntimeHelper.java#L20 will never return true
in AEMaaCS.
The following two queries are executed at startup (thread name "Apache Sling Repository Startup Thread #1-ACTool-Config-Worker"). Depending on the content, they may try to read more than 100'000 nodes, which throws an exception, and so startup fails.
Specially the first query may read too many entries. Both queries use the index /oak:index/acPrincipalName.
I think this is somewhat related to https://github.com/Netcentric/accesscontroltool/issues/219 - however switching to a Lucene index won't resolve the issue.