Acosix / alfresco-keycloak

Alfresco addon to provide Keycloak-related extensions / customisations for Repository and Share
Apache License 2.0
36 stars 20 forks source link

Sync users problem #20

Closed luizoamorim closed 3 years ago

luizoamorim commented 3 years ago

There are ~13.000 users inner keycloak. But when synchronization by batchSize is enabled only the first batch is printed.

We already have this settings: keycloak.adapter.auth-server-url=https://keycloak.test/auth keycloak.adapter.realm=COMPANY keycloak.adapter.resource=ALFRESCO_REPO keycloak.adapter.credentials.secret= keycloak.synchronization.enabled=true keycloak.synchronization.personLoadBatchSize=20 keycloak.synchronization.groupLoadBatchSize=10

Debug log shows that only the first batch was fetched, but Alfresco tools doesn't list any synchronized users or groups.

The follow message is printed after the sync is ended: INFO [security.sync.ChainingUserRegistrySynchronizer] [main] 13,431 user(s) and 37 group(s) processed

We're using the alfresco-keycloak master branch with Alfresco 7 community version.

Do you have any idea?

AFaust commented 3 years ago

Hmm... so the log output of Alfresco reports that 13.000+ users have been processed / synchronised but you are not able to find them via the "users" tool, is that correct? What do you mean with "only the first batch is printed"? Printed where specifically? I am very interested in getting this mass synchronisation to work - so far, I have only had the chance to test relatively small amounts and did not encounter any issues. This is also why I am consciously continuing to leave the project in a "release candidate" status for current version in which its scope has been massively expanded, as I sort of expected that there be some scale or other non-trivial scenarios where issues might crop up. I also realise now that the IDM client responsible for fetching the users / groups from Keycloak could do with some more trace/debug logging to make analysing this simpler...

rtnascimento commented 3 years ago

Hey Axel!

Correct. We are not able to find them via the "users" tool. We've setted the following logger to debug.

log4j.logger.de.acosix.alfresco.keycloak.repo=debug

The output message is like as follow repeated 50 times. Exatly the amount configured in keycloak.synchronization.personLoadBatchSize property. When this property is changed reflects on how many times the messeges are printed.

DEBUG [repo.sync.GroupContainmentUserFilter] [main] Checking user for containment in groups with paths [/group_user] / IDs null, using allowTransitive=true and requireAll=false DEBUG [repo.sync.GroupContainmentUserFilter] [main] Group containment result for user : false

If I set this personLoadBatchSize property to a high value like 15000 only 1664 debug messages are printed and again nothing happens. The log keeps indicating 13000+ users synchronized. This sync last almost 10 minutes even groupPaths restricts the fetch. I don't know if its possible to change the sync strategy to a better performance.

Frequently occurs a timeout when synchronizating.

WARN [security.sync.ChainingUserRegistrySynchronizer] [main] Failed initial synchronize with user registries org.alfresco.error.AlfrescoRuntimeException: 06260018 Failed to retrieve entities at de.acosix.alfresco.keycloak.repo.client.IDMClientImpl.processEntityBatch(IDMClientImpl.java:478) at de.acosix.alfresco.keycloak.repo.client.IDMClientImpl.processMembers(IDMClientImpl.java:299) at de.acosix.alfresco.keycloak.repo.sync.KeycloakUserRegistry.mapGroup(KeycloakUserRegistry.java:286) at de.acosix.alfresco.keycloak.repo.sync.KeycloakUserRegistry$KeycloakAuthorityCollection.convert(KeycloakUserRegistry.java:376) at de.acosix.alfresco.keycloak.repo.sync.KeycloakUserRegistry$KeycloakAuthorityCollection$KeycloakAuthorityIterator.lambda$checkAndFillBuffer$0(KeycloakUserRegistry.java:430) at de.acosix.alfresco.keycloak.repo.sync.KeycloakUserRegistry$GroupCollection.processGroupsRecursively(KeycloakUserRegistry.java:523) at de.acosix.alfresco.keycloak.repo.sync.KeycloakUserRegistry$GroupCollection.lambda$loadNext$0(KeycloakUserRegistry.java:514) at de.acosix.alfresco.keycloak.repo.client.IDMClientImpl.processEntityBatch(IDMClientImpl.java:458) at de.acosix.alfresco.keycloak.repo.client.IDMClientImpl.processGroups(IDMClientImpl.java:271) at de.acosix.alfresco.keycloak.repo.sync.KeycloakUserRegistry$GroupCollection.loadNext(KeycloakUserRegistry.java:513) at de.acosix.alfresco.keycloak.repo.sync.KeycloakUserRegistry$KeycloakAuthorityCollection$KeycloakAuthorityIterator.checkAndFillBuffer(KeycloakUserRegistry.java:429) at de.acosix.alfresco.keycloak.repo.sync.KeycloakUserRegistry$KeycloakAuthorityCollection$KeycloakAuthorityIterator.hasNext(KeycloakUserRegistry.java:396) at org.alfresco.repo.batch.BatchProcessor$WorkProviderIterator.hasNext(BatchProcessor.java:607) at org.alfresco.repo.batch.BatchProcessor.process(BatchProcessor.java:394) at org.alfresco.repo.security.sync.ChainingUserRegistrySynchronizer.syncWithPlugin(ChainingUserRegistrySynchronizer.java:1749) at org.alfresco.repo.security.sync.ChainingUserRegistrySynchronizer.synchronizeInternal(ChainingUserRegistrySynchronizer.java:739) at org.alfresco.repo.security.sync.ChainingUserRegistrySynchronizer.access$15(ChainingUserRegistrySynchronizer.java:474) at org.alfresco.repo.security.sync.ChainingUserRegistrySynchronizer$7.doWork(ChainingUserRegistrySynchronizer.java:2138) at org.alfresco.repo.security.authentication.AuthenticationUtil.runAs(AuthenticationUtil.java:602) at org.alfresco.repo.security.sync.ChainingUserRegistrySynchronizer.onBootstrap(ChainingUserRegistrySynchronizer.java:2132) at org.springframework.extensions.surf.util.AbstractLifecycleBean.onApplicationEvent(AbstractLifecycleBean.java:56) at org.alfresco.repo.security.sync.ChainingUserRegistrySynchronizer.onApplicationEvent(ChainingUserRegistrySynchronizer.java:2495) at org.springframework.context.event.SimpleApplicationEventMulticaster.doInvokeListener(SimpleApplicationEventMulticaster.java:176) at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:169) at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:143) at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:131) at org.alfresco.repo.management.subsystems.ChildApplicationContextFactory$ChildApplicationContext.publishEvent(ChildApplicationContextFactory.java:569) at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:943) at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:591) at org.alfresco.repo.management.subsystems.ChildApplicationContextFactory$ApplicationContextState.start(ChildApplicationContextFactory.java:824) at org.alfresco.repo.management.subsystems.AbstractPropertyBackedBean.start(AbstractPropertyBackedBean.java:1098) at org.alfresco.repo.management.subsystems.AbstractPropertyBackedBean.onApplicationEvent(AbstractPropertyBackedBean.java:637) at org.alfresco.repo.management.SafeApplicationEventMulticaster.multicastEventInternal(SafeApplicationEventMulticaster.java:221) at org.alfresco.repo.management.SafeApplicationEventMulticaster.multicastEvent(SafeApplicationEventMulticaster.java:186) at org.alfresco.repo.management.SafeApplicationEventMulticaster.multicastEvent(SafeApplicationEventMulticaster.java:206) at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:426) at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:383) at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:943) at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:591) at org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:401) at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:292) at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:103) at org.alfresco.web.app.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:70) at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4716) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5177) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:717) at org.apache.catalina.core.ContainerBase.access$000(ContainerBase.java:129) at org.apache.catalina.core.ContainerBase$PrivilegedAddChild.run(ContainerBase.java:150) at org.apache.catalina.core.ContainerBase$PrivilegedAddChild.run(ContainerBase.java:140) at java.base/java.security.AccessController.doPrivileged(Native Method) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:688) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:706) at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1133) at org.apache.catalina.startup.HostConfig$DeployDirectory.run(HostConfig.java:1866) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at org.apache.tomcat.util.threads.InlineExecutorService.execute(InlineExecutorService.java:75) at java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118) at org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:1045) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:429) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1576) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:309) at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:123) at org.apache.catalina.util.LifecycleBase.setStateInternal(LifecycleBase.java:423) at org.apache.catalina.util.LifecycleBase.setState(LifecycleBase.java:366) at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:936) at org.apache.catalina.core.StandardHost.startInternal(StandardHost.java:843) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183) at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1384) at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1374) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at org.apache.tomcat.util.threads.InlineExecutorService.execute(InlineExecutorService.java:75) at java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:140) at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:909) at org.apache.catalina.core.StandardEngine.startInternal(StandardEngine.java:262) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183) at org.apache.catalina.core.StandardService.startInternal(StandardService.java:434) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183) at org.apache.catalina.core.StandardServer.startInternal(StandardServer.java:930) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183) at org.apache.catalina.startup.Catalina.start(Catalina.java:772) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:342) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:473) alfresco | Caused by: java.net.SocketTimeoutException: Read timed out at java.base/java.net.SocketInputStream.socketRead0(Native Method) at java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115) at java.base/java.net.SocketInputStream.read(SocketInputStream.java:168) at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140) at java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:478) at java.base/sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:472) at java.base/sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:70) at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1354) at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:963) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:161) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:82) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:276) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:294) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:257) at org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:230) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:679) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:481) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:835) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) at de.acosix.alfresco.keycloak.repo.client.IDMClientImpl.processEntityBatch(IDMClientImpl.java:435) ... 87 more

AFaust commented 3 years ago

Increasing the batch size is definitely not recommended and expected to cause more and more timeout issues, unless keycloak.adapter.socketTimeout is increased. Apparently you have users without a user name in your Keycloak instance, otherwise the log statement should have included the user name (e.g. Checking user XYZ for containment in groups [...] Unfortunately, Keycloak does not allow users to be easily filtered by group paths, so the synch will always indicate the total amount of users retrieved and processed from Keycloak, even if the users then fail the filtering. I'll have to check if the most recent Keycloak versions improved the API in that regard, but am not optimistic. Just to make sure: Should there be any users at all that are members of the root-level group group_user? Is the name of the group correct, e.g. is there any chance it should be just user instead? Note that in this case, there is no need to compensate for any GROUP_ prefix which Alfresco typically prepends to a group's name - all IDs / name paths are checked exclusively against the values as provided by Keycloak.

rtnascimento commented 3 years ago

Right, this is the correct group name. Coincidentally, I created using that name. But this group was created to test if alfresco-keycloak could sync some users.

Even I don't use groupPaths the sync doesn't work.

Username is correct in our keycloak. It was intentionally hidden by me.

Any suggestion to do?

Login works fine when a keycloak user authenticates in Alfresco. After this, only that user is listed on "users" tool in Alfresco Share.

UPDATE: The timeout issue occurs even the property keycloak.synchronization.personLoadBatchSize is set to 20 and keycloak.synchronization.groupLoadBatchSize is set to 10. Last time worked only when I set groupPaths.

UPDATE 2: Using Keycloak 12.0.4 Apparently there is some KC REST API incompatibility. Alfresco-keycloak doesn't recognize the next users page to fetch.

AFaust commented 3 years ago

Ah - it's good to know you are using a newer Keycloak version. AFAIK, according to the Keycloak documentation, the server and adapter libraries should be kept closely aligned. Unfortunately, this means I would have to update the module for every release of Keycloak, and users of the module could not use newer versions of it unless they also updated their Keycloak server. Maybe I'll have to decouple the fat-JARed Keycloak adapter libraries again from the module, and provide multiple versions of those for different versions of Keycloak. I really would like to avoid that though...

How critical is this issue for you currently? I am asking primarily because I just started a new customer project and cannot guarantee that I can find some time in the short-term to take a look at and test with the newest Keycloak version. The earliest chance I may have could be Sunday, during a longer train ride...

rtnascimento commented 3 years ago

Thanks a lot Axel for your quick response and help! It is ok to us. Take your time to change the code. For a while I am going to debug trying to find out what is really happening.

rtnascimento commented 3 years ago

We tried to sync 2 users and 2 subgroups contained in group 'group1' as follow. Remembering you that our user database is 13000+.

group1
 - UserA
 - group11
 - group12
   - UserB

After debug we fixed this issue by changing some of your code. First of all, we noticed if UserA and UserB were not part of the 50 users fetched from keycloak to be processed, alfresco-keycloak considers there is no more register to fetch. It was solved by changing this method to:

            protected synchronized void checkAndFillBuffer()
            {
                while ((this.buffer.isEmpty() || this.index >= this.buffer.size()) && !this.noMoreResults)
                {
                    this.buffer.clear();

                    this.index = 0;
                    this.offset += KeycloakAuthorityCollection.this.loadNext(this.offset, KeycloakAuthorityCollection.this.batchSize,
                            authority -> this.buffer.add(KeycloakAuthorityCollection.this.convert(authority)));

                    this.noMoreResults = this.offset >= KeycloakAuthorityCollection.this.totalUpperBound;
                }
            }

And this method to:

    public int countGroups()
    {
        final URI uri = KeycloakUriBuilder.fromUri(this.deployment.getAuthServerBaseUrl()).path("/admin/realms/{realm}/groups/count")
                .queryParam("top", true)
                .build(this.deployment.getRealm());

        final AtomicInteger count = new AtomicInteger(0);
        this.processGenericGet(uri, root -> {
            if (root.isObject() && root.has("count"))
            {
                count.set(root.get("count").intValue());
            }
            else
            {
                throw new AlfrescoRuntimeException("Keycloak admin API did not yield expected JSON data for group count");
            }
        });

        return count.get();
    }

Now we get another issue. Sync always results in timeout. First it failed after ~57min using the following properties value.

                -Dkeycloak.synchronization.enabled=true
                -Dkeycloak.adapter.socketTimeout=30000
                -Dkeycloak.synchronization.userFilter.containedInGroup.property.groupPaths=/group1
                -Dkeycloak.synchronization.groupFilter.containedInGroup.property.groupPaths=/group1

Sync timed out and failed again after ~32min using new properties value as follow.

                -Dkeycloak.synchronization.enabled=true
                -Dkeycloak.adapter.socketTimeout=90000
                -Dkeycloak.synchronization.userFilter.containedInGroup.property.groupPaths=/group1
                -Dkeycloak.synchronization.groupFilter.containedInGroup.property.groupPaths=/group1
                -Dkeycloak.synchronization.personLoadBatchSize=1000
                -Dkeycloak.synchronization.groupLoadBatchSize=1000

It takes too long to sync only 2 users and 2 groups. Maybe creating a flag indicating a scheduled sync instead of not running at startup could be better to a faster alfresco start up.

AFaust commented 3 years ago

Maybe creating a flag indicating a scheduled sync instead of not running at startup could be better to a faster alfresco start up.

That is actually a default Alfresco setting you can change - the alfresco-keycloak module does not actually control when synchronization takes place. Check the Alfresco synchronization subsystem properties.

As for the method changes:

rtnascimento commented 3 years ago

That is actually a default Alfresco setting you can change - the alfresco-keycloak module does not actually control when synchronization takes place. Check the Alfresco synchronization subsystem properties.

Great!

I am a bit confused why the group count call would need the "top" parameter. It should always just return a single count of all groups in Keycloak.

Buffer is filled only with the top groups and the subgroups are processed recursively. So the totalUpperBound must match to it.

AFaust commented 3 years ago

Buffer is filled only with the top groups and the subgroups are processed recursively. So the totalUpperBound must match to it.

Ah, right. Sorry - I got confused there with a different system I recently integrated at a customer that uses top as a pagination control, and not a filter for root-level elements.