Closed Noordsestern closed 3 years ago
@Noordsestern: Do you have any TaskListeners attached to the activity "Retoure korrigieren"? Any other cross cutting concepts like e.g. Spring Boot Event Listeners? This really sounds confusing. The Keycloak Plugin has nothing to do with transactions.
How many users do you have in Keycloak? Are you using the LDAP user federation? Anything else? Typically this performs well, even if you have > 10.000 users within Keycloak.
In order to make sure that this has nothing to do with bad Keycloak performance: can you temporarily try the 2.2.0-SNAPSHOT
version of the Keycloak Identity Provider Plugin and then set the cacheEnabled
attribute to true
?
... Sorry, 2.1.1-SNAPSHOT
is the newest Snapshot version.
https://app.camunda.com/nexus/repository/camunda-bpm-community-extensions-snapshots/org/camunda/bpm/extension/camunda-bpm-identity-keycloak-run/2.1.1-SNAPSHOT/camunda-bpm-identity-keycloak-run-2.1.1-20210713.140837-10.jar
Another hint: please set the authorizationCheckEnabled
configuration attribute of the plugin to false and check whether this is different.
If this property is set to true, then authorization checks are performed when querying for users or groups. Otherwise authorization checks are not performed when querying for users or groups. Default: true. Note: If you have a huge amount of Keycloak users or groups we advise to set this property to false to improve the performance of the user and group query.
Do you have any TaskListeners attached to the activity "Retoure korrigieren"? Any other cross cutting concepts like e.g. Spring Boot Event Listeners?
There are not listeners or anything attached. Only bare Camunda Run with KeycloakPlugin jar :
FROM camunda/camunda-bpm-platform:run-7.15.0
USER camunda
ADD camunda-bpm-identity-keycloak-run-2.1.0.jar /camunda/configuration/userlib/
ADD --chown=camunda:camunda default.yml /camunda/configuration/default.yml
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["./camunda.sh"]
How many users do you have in Keycloak? Are you using the LDAP user federation?
I do not know how many users we have, but i assume it doesnt exceed hundreds. We do use User Federation.
please set the authorizationCheckEnabled
We did try authorizationCheckEnabled
last week. But without success.
On the other hand, we were so desperate that we just tried any possible configuration, maybe we missed something. I will try 2.1.1-SNAPSHOT and the configuration this afternoon.
I do not know how many users we have, but i assume it doesnt exceed hundreds. We do use User Federation.
You should check the synchronisation settings of the user federation as well.
The snapshot seems to have missing dependencies:
│ Caused by: java.lang.ClassNotFoundException: com.github.benmanes.caffeine.cache.Ticker ││ at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471) ~[na:na] ││ at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589) ~[na:na] ││ at org.springframework.boot.loader.LaunchedURLClassLoader.loadClass(LaunchedURLClassLoader.java:151) ~[camunda-bpm-run-core.jar:7.15.0] ││ at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) ~[na:na]
I will check that soon.
The snapshot seems to have missing dependencies:
│ Caused by: java.lang.ClassNotFoundException: com.github.benmanes.caffeine.cache.Ticker ││ at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471) ~[na:na] ││ at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589) ~[na:na] ││ at org.springframework.boot.loader.LaunchedURLClassLoader.loadClass(LaunchedURLClassLoader.java:151) ~[camunda-bpm-run-core.jar:7.15.0] ││ at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) ~[na:na]
Sorry for that. So we currently can't test it using the supposed new cache feature. The build process started to have problems with the maven shade plugin here on the community hub - which should not affect you and is a different story. So the cache feature is sadly not near to be released and will take some time. It would have been a quick identication that your Keycloak server is not that responsive...
If you're not tired of testing:
If you still want to check if this is a performance issue by activating a cache feature (it caches results of Keycloak queries for a certain time), attached you'll find the Keycloak Plugin. Build locally and thus free of the Community Hub specifics when deploying artifacts to Nexus. Tested with Camunda Run 7.15.3-ee
camunda-bpm-identity-keycloak-run-2.1.1-SNAPSHOT.jar.zip
Besides this small test, I would a) set the flag "authorizationCheckEnabled" of the Keycloak plugin to false and then b) look at the synchronization frequency between Keycloak & LDAP / Active Directory in the User Federation.
If you don't mind you can send me your default.yaml
Snapshot contain dependencies now, thank you 👍 Unfortunately, the behaviour remains the same 😢
Our default.yml is:
# Externalized Keycloak configuration
keycloak:
# Keycloak access for the Identity Provider plugin.
url.plugin: ${ENV_KEYCLOAK_URL_PLUGIN:https://localhost:9001}
# Keycloak Camunda Identity Client
client.id: ${ENV_KEYCLOAK_CLIENT_ID:camunda-identity-service}
client.secret: ${ENV_KEYCLOAK_CLIENT_SECRET:42xx42xx-a17b-c63d-e26f-42xx42xx42xx42xx}
# Camunda Keycloak Identity Provider Plugin
plugin.identity.keycloak:
keycloakIssuerUrl: ${keycloak.url.plugin}/auth/realms/dpa
keycloakAdminUrl: ${keycloak.url.plugin}/auth/admin/realms/dpa
clientId: ${keycloak.client.id}
clientSecret: ${keycloak.client.secret}
useUsernameAsCamundaUserId: true
useGroupPathAsCamundaGroupId: true
administratorGroupName: DL - Camunda - Admins
disableSSLCertificateValidation: true
cacheEnabled: true
authorizationCheckEnabled: false
camunda.bpm:
# https://docs.camunda.org/manual/latest/user-guide/security/#authorization
# https://docs.camunda.org/manual/latest/user-guide/process-engine/authorization-service/
authorization.enabled: true
Hmmm.... Just thinking out aloud: It says "2021-07-14T19:17:46.097265933Z org.springframework.transaction.CannotCreateTransactionException: Could not open JDBC Connection for transaction; nested exception is java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms." So there are no database connections available any more. The plugin does nothing special with transactions. All REST queries against Keycloak might be part of an overal Camunda transaction, ok. The plugin might query authorization information which is still part of the database and not Keycloak. By setting authorizationCheckEnabled to false this is disabled.
What database do you use for Camunda? Have you configured the connection pool? What is your min size and max size?
Setting
logging.level.org.camunda.bpm.extension.keycloak: DEBUG
logging.level.org.springframework.web.client.RestTemplate: DEBUG
might give you further debug info on the queries against the Identity Provider Plugin and the ones sent to Keycloak. Would be interesting to see what happens once you try to open the task list.
Uffff... the worst possible outcome: when activating DEBUG it suddenly worked. Not at first, but once the first request failed, the caching must have kicked in and everything is fast and works properly.
We run Postgres12 DB with default settings on Camunda side. I will try to override the hikari settings and see if that has an effect. We assume now, that querying users from keycloak takes for some reason a looooong time. Because also listing users in Admin panel takes forever. When opening a task list, the tasks belonging to my user are fetched. If now the user query take forever it might exceed transaction timeout.
I think you are on the right track.
Typically Keycloak itself is very fast. I would definitely check the user federation settings. If Keycloak itself syncs user data with LDAP / Active Directory on each single request this might be the reason for Keycloak answering so slowly. Of course there might be other reasons for Keycloak being slow.
Alright. Another day of wonders and learning:
It turns out, that the GET {realm]/users
endpoint is extremely slow in our Keycloak. But we cannot figure out why. Sync wth LDAP should happen only every 12 hours. I limit the number of users imported from LDAP to 120, which improves performance only a bit.
Now, opening the task list containg at least 1 assigned user takes roughly 1 minute. But then caching seems to work and everything goes fast.
This is the log I get when opening the task list:
2021-07-20 14:11:08.298 DEBUG 10 --- [nio-8080-exec-5] org.camunda.bpm.extension.keycloak : KEYCLOAK-01050 Keycloak group query results: [UserEntity[id=stah045, revision=0, firstName=Markus, lastName=Stahl, email=markus.stahl@google.de, password=null, salt=null, lockExpirationTime=null, attempts=0], ]
2021-07-20 14:11:08.874 DEBUG 10 --- [nio-8080-exec-9] org.camunda.bpm.extension.keycloak : KEYCLOAK-01021 FIND user with query ?max=250
2021-07-20 14:11:08.875 DEBUG 10 --- [nio-8080-exec-9] o.c.b.e.k.rest.KeycloakRestTemplate : HTTP GET https://keycloak.postadress.io/auth/admin/realms/camunda/users?max=250
2021-07-20 14:11:08.876 DEBUG 10 --- [nio-8080-exec-9] o.c.b.e.k.rest.KeycloakRestTemplate : Accept=[text/plain, application/json, application/*+json, */*]
2021-07-20 14:11:34.581 DEBUG 10 --- [nio-8080-exec-9] o.c.b.e.k.rest.KeycloakRestTemplate : Response 200 OK
2021-07-20 14:11:34.581 DEBUG 10 --- [nio-8080-exec-9] o.c.b.e.k.rest.KeycloakRestTemplate : Reading to [java.lang.String] as "application/json"
We do have like 800 groups. Maybe that is an issue?
What still confuses me that on login everything is fast and fine. But only when Admin / List all users
is called or Tasklist
is opened in the UI, things somehow slow down. When I log in, I can see all my LDAP groups in my Camunda profile. But if everything is there after login, why does Camunda query ALL users / groups when I only want to see my tasks.
Hi @Noordsestern have you checked the basics of your Keycloak installation? Database, Memory, ... etc.? Sorry for asking, but sometimes we forget the very basic things. 25 seconds for querying ~ 100 users?!? No way that this is normal behaviour. Maybe count down 2 seconds if its really bad. The question I can't answer is why the Camunda Tasklist queries all users upon opening it.
Alright. I can't figure out, why it is so slow. A few findings:
KeycloackPlugin or Camunda trigger a huge group-query when opening a task list, querying basically every group that is available in this log line:
2021-07-21 12:51:16.970 DEBUG 9 --- [nio-8080-exec-8] org.camunda.bpm.extension.keycloak : KEYCLOAK-01050 Keycloak group query results:
The query is huge, because we have roughly 800 groups.
I cannot imagine, why all have to be queried, even though there are only 2 tasks available.
In case somebody runs in to a similar problem. those action showed improvements:
Camunda seems to keep an open db transaction for the duration of its keycloak-queries. When queries are slow the db connections block the complete database. Adding the following to your default.yml
or production.yml
helps:
spring.datasource:
# see https://github.com/brettwooldridge/HikariCP#gear-configuration-knobs-baby
hikari:
connectionTimeout: 30000
idleTimeout: 600000
maxLifetime: 900000
#Hikari 4.0
keepaliveTime: 60000
# see https://github.com/brettwooldridge/HikariCP#gear-configuration-knobs-baby (minimumIdle)
# and https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing
minimumIdle: 10
maximumPoolSize: 20
In our experiments we always required the properties. Because caching is only available in version 2.1.1 onwards, you do require the current snapshot in order to set cacheEnabled
:
# Camunda Keycloak Identity Provider Plugin
plugin.identity.keycloak:
cacheEnabled: true
authorizationCheckEnabled: false
I left the company realm and created a realm specific for Camunda. I narrowed down the LDAP Tree which will be synced to only people who might possibly work with Cmaunda. Now I have only 125 users and 440 groups (groups have recurions). That improved query performance.
We increased cluster resources for Keycloak Postgres DB to 1 cpu and 2 Gib . But we did not witness any effect.
I will post this also to Camunda and close this issue then. I might have a few requests for the cashing feature, though ;)
Thanks for your investigations and especially for sharing your findings.
I'd like to add one thing to your findings: the Keycloak Identity Provider Plugin never queries all users or all groups configured in Keycloak. There is a maxResultSize
parameter with a default value of 100
exactly for preventing side effects of such queries.
I have seen a Keycloak installation with more than 10.000 users and still the user query returned in acceptable time. I suspect that the cause is to be found in Keycloak itself and its configuration.
However, this is a good motivation to make the new cache feature release-ready. The current SNAPSHOT build process is unfortunately still having problems, but that will hopefully be solved soon.
Hi,
Running:
Camunda Platform Run 7.15.0 KeycloackIdentifyProviderPlugin 2.1.0 Keycloak 13
I am not sure anymore that the refresh token thing has something to do with this, so we can probably close #68 as it is only a finding while actually investigating this issue: Everything is fine, until I claim my first user task. The UI does not respond to the claim, but in database, I can verify that I am the assigne. Also in camunda cockpit I am the assignee:
Also in Task Authorization, I can see that I have ALL rights to the task.
But I cannot work on the task. From that time on, opening the task list crashes the database access after 30 seconds:
After a few minutes, camunda platform recovers, but as soon as I open the task list, the db connection crashes again.
If I remove the assigne from the task (either through direct DB access or cockpit), I can again open the complete tasklist. But as soon as I claim a task, weirdness starts all over.
And this only happens, when I have the KeycloakPlugin installed. It works fine without it. This is so confusing :confused:
Regards, Markus