OpenLiberty / open-liberty

Open Liberty is a highly composable, fast to start, dynamic application server runtime environment
https://openliberty.io
Eclipse Public License 2.0
1.14k stars 588 forks source link

SessionCache does not work after upgrading to 23.0.0.10 #27135

Closed c-koell closed 6 months ago

c-koell commented 9 months ago

Describe the bug
We are using sessionCache-1.0 (hazelcast) long time now. After upgrading to 23.0.0.10 it is does not work anymore.

Steps to Reproduce
Upgrading from 23.0.0.9 to 23.0.0.10

Expected behavior
The Session will not be synchronized between two Cluster members.

Diagnostic information:

Our sample server.xml

<variable name="hazelcast.lib" defaultValue="${shared.resource.dir}/hazelcast.jar"/>
<library id="jCacheVendorLib">
   <file name="${hazelcast.lib}" />
</library>
<httpSessionCache writeContents="ALL_SESSION_ATTRIBUTES" libraryRef="jCacheVendorLib">
   <properties hazelcast.config.location="https://static.server/rest/config/hazelcast-config.xml"/>
</httpSessionCache>

With this hazelcast-config.xml

<hazelcast xmlns="http://www.hazelcast.com/schema/config"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.hazelcast.com/schema/config
       https://hazelcast.com/schema/config/hazelcast-config-5.3.xsd">
    <network>
        <port auto-increment="false">5708</port>
        <join>
            <multicast enabled="false"/>
            <tcp-ip enabled="true">
                <member>vappsae01.domain:5708</member>
                <member>vappsae02.domain:5708</member>
            </tcp-ip>
        </join>
    </network>
</hazelcast>
zhang-ca commented 9 months ago

Looks like hazelcast configuration issue, does your messages.log showed at least 3 Members similar to below:

[5/23/23, 18:22:13:883 UTC] 0000003e com.hazelcast.internal.cluster.ClusterService I [10.244.0.18]:5701 [CartCluster] [5.1.1]

Members {size:3, ver:3} [ Member [10.244.0.18]:5701 - 4201e881-6340-4a2f-8117-dcfcf366a8fe this Member [10.244.0.19]:5701 - d12c614f-4f2e-48f4-a6a6-311df971b5bb Member [10.244.0.20]:5701 - 1f4d8b85-53eb-4325-a064-7d4a6733b604

Otherwise, please enable the session trace and send to zhang@ca.ibm.com com.ibm.ws.session.*=all

zhang-ca commented 9 months ago

By the way, can you set <multicast enabled="true"/> ?

<hazelcast xmlns="http://www.hazelcast.com/schema/config"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.hazelcast.com/schema/config
       https://hazelcast.com/schema/config/hazelcast-config-5.3.xsd">

    <cluster-name>CartCluster</cluster-name>

    <network>
        <join>
           <multicast enabled="true"/>
        </join>
    </network>
</hazelcast>
c-koell commented 9 months ago

Hi @zhang-ca

I see with both Liberty Versions 23.0.0.9 and 23.0.0.10 following log in the messages.log

[12.12.23, 07:01:44:080 MEZ] 00000047 com.hazelcast.internal.cluster.ClusterService                I [vappsae01.domain]:5708 [dev] [5.3.6] 

Members {size:2, ver:2} [
    Member [vappsae02.domain]:5708 - 688435b1-4076-4cbd-a951-c87abb2fa47f
    Member [vappsae01.domain]:5708 - 8a3e1e86-49e6-45ff-a6f9-4388de13f705 this
]

We would not like to enable multicast. We prefer to define the members directly. I will enable the Trace on both Servers with Liberty Version 23.0.0.10 and send the logs to you via mail.

zhang-ca commented 9 months ago

This may have to do with the newer version of Hazelcast, the default Hazelcast JCache provider is the client-side CachingProvider. https://docs.hazelcast.com/hazelcast/5.3/migrate/upgrading-from-imdg-3#jcache-default-caching-provider

Since Open Liberty is commonly used for Hazelcast embedded in the application instances, the default now has to be overridden to the member-side provider implementation using the hazelcast.jcache.provider.type property.

Please try setting the Liberty bootstrap.poperties as: hazelcast.jcache.provider.type = member

In addition, the Hazelcast config <group> element is replaced by the <cluster-name> element. https://docs.hazelcast.com/hazelcast/5.3/migrate/upgrading-from-imdg-3#changes-in-security-configurations So for the application hazelcast-config.xml, please add <cluster-name> element. as:

<hazelcast xmlns="http://www.hazelcast.com/schema/config"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.hazelcast.com/schema/config
       https://hazelcast.com/schema/config/hazelcast-config-5.3.xsd">

    <!--
        The name of the cluster. All members of a single cluster must have the same cluster name
        configured and a client connecting to this cluster must use it as well.
    -->
    <cluster-name>YourCluster</cluster-name>

    <network>
        <port auto-increment="false">5708</port>
        <join>
            <multicast enabled="false"/>
            <tcp-ip enabled="true">
                <member>vappsae01.domain:5708</member>
                <member>vappsae02.domain:5708</member>
            </tcp-ip>
        </join>
    </network>
</hazelcast>

Details can be found in https://github.com/OpenLiberty/open-liberty/issues/20467#issuecomment-1102733885

zhang-ca commented 9 months ago

It appears the issue was due to TrustedHeaderAuthenticationMechanismfeature in 23.0.0.10.

Do you know where in the server.xml configured this feature?

c-koell commented 9 months ago

Hi @zhang-ca

This is not a explicit feature it is a Jakarta EE Security Mechanism (HttpAuthenticationMechanism) witch can be used with the webProfile-8.0 feature. See https://developer.ibm.com/tutorials/j-javaee8-security-api-2/ Section -> Writing a custom HttpAuthenticationMechanism

So there is noting special in the server.xml File. We called the Class TrustedHeaderAuthenticationMechanism

greets claus

c-koell commented 8 months ago

@zhang-ca Any news here ? Thanks in advance !

zhang-ca commented 8 months ago

The Rest servlet runs through a TrustedHeaderAuthenticationMechanism which creates a Object NewWebUserInformation and stored into the session. The subsequent session failed with invalid security token on the failover Pod/Server.

23.0.0.10 traces:

[14.12.23, 10:20:11:219 MEZ] 0000002b id=00000000 com.ibm.ws.security.token.internal.TokenManagerImpl I CWWKS4001I: Das Sicherheitstoken kann nicht validiert werden. Die folgenden Ursachen sind möglich:

  1. Das Sicherheitstoken wurde auf einem anderen Server generiert, der andere Schlüssel verwendet.
  2. Die Tokenkonfiguration oder die Sicherheitsschlüssel des Token-Service, der das Token erstellt hat, wurden geändert.
  3. Der Token-Service, der das Token erstellt hat, ist nicht mehr verfügbar.

Session invalidated due to invalid security token:

[14.12.23, 10:20:11:234 MEZ] 0000002b id=00000000 MemorySession > invalidate ENTRY AppName=default_host/; Id=gdirIuvYcMWMNjH8Kq_nGZK

c-koell commented 7 months ago

Hi @zhang-ca !

Any news from the the other teams in the meanwhile ?

thanks claus

zhang-ca commented 7 months ago

Someone invalidated the session when CWWKS4001I happened, I will work on a debug jar to find out who invalidated the session.

c-koell commented 7 months ago

I have send you a trace (10.01.2024) where i have shared the ltpa keys between both servers. As i can see there is no CWWKS4001I message in the log but the replication still not worked.

zhang-ca commented 7 months ago

com.ibm.ws.session_1.0.82.zip

Please try this jar in the zip file. i.e. backup com.ibm.ws.session_1.0.82.jar under your_server\wlp\lib and replace the attached jar

c-koell commented 7 months ago

Here are the the trace files of both servers.

trace.zip

zhang-ca commented 7 months ago

It appears your build didn't pick up the com.ibm.ws.session_1.0.82.jar above. on 23.0.0.10. Can you ensure you replace the new com.ibm.ws.session_1.0.82.jar under your server \wlp\lib directory.

The jar will produce a stack trace between the following line:

[24.01.24, 12:58:19:748 MEZ] 0000002b id=00000000 com.ibm.ws.security.token.internal.TokenManagerImpl          I CWWKS4001I: Das Sicherheitstoken kann nicht validiert werden. Die folgenden Ursachen sind möglich:
1. Das Sicherheitstoken wurde auf einem anderen Server generiert, der andere Schlüssel verwendet.
2. Die Tokenkonfiguration oder die Sicherheitsschlüssel des Token-Service, der das Token erstellt hat, wurden geändert.
3. Der Token-Service, der das Token erstellt hat, ist nicht mehr verfügbar.

[24.01.24, 12:58:19:748 MEZ] 0000002b id=00000000 MemorySession                                                > invalidate ENTRY  AppName=default_host/; Id=B6lAXNCejggLTSA7VhjofmB
zhang-ca commented 7 months ago

com.ibm.ws.session_1.0.82.zip

I rebuild the jar to ensure trace will throw during invalidation. Please unzip the the file and replace the latest com.ibm.ws.session_1.0.82.jar

c-koell commented 7 months ago

trace_25.01.2024.zip I have taken new traces and you will see the Stacktrace in the messages.log

zhang-ca commented 7 months ago

@c-koell so it was the security SSOAuthenticator found the Ltpa token not matching the key on the other pod, and invalidated the session. I found CWWKS4001I happened on both 23.0.0.9 and 23.0.0.10 but the latter one catches the error and stop the subsequent session. Can you ensure the same ltpa keys were used on both pods.

[25.01.24, 07:27:34:650 MEZ] 000000cb id=00000000 com.ibm.ws.security.token.internal.TokenManagerImpl          I CWWKS4001I: Das Sicherheitstoken kann nicht validiert werden. Die folgenden Ursachen sind möglich:
1. Das Sicherheitstoken wurde auf einem anderen Server generiert, der andere Schlüssel verwendet.
2. Die Tokenkonfiguration oder die Sicherheitsschlüssel des Token-Service, der das Token erstellt hat, wurden geändert.
3. Der Token-Service, der das Token erstellt hat, ist nicht mehr verfügbar.

[25.01.24, 07:27:34:650 MEZ] 000000cb id=00000000 SystemOut                                                    O java.lang.Throwable: Throw exception to find out who sent the unauthorized request !!!
[25.01.24, 07:27:34:665 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.session.store.memory.MemorySession.invalidate(MemorySession.java:207)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.session.http.AbstractHttpSession.invalidate(AbstractHttpSession.java:207)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.session.AbstractSessionData.invalidate(AbstractSessionData.java:175)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.session.AbstractHttpSessionFacade.invalidate(AbstractHttpSessionFacade.java:166)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.AuthenticateApi.invalidateSession(AuthenticateApi.java:505)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.AuthenticateApi.simpleLogout(AuthenticateApi.java:349)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.internal.SSOAuthenticator.cleanupLoggedOutToken(SSOAuthenticator.java:253)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.internal.SSOAuthenticator.handleLtpaSSO(SSOAuthenticator.java:196)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.internal.SSOAuthenticator.handleSSO(SSOAuthenticator.java:139)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.internal.SSOAuthenticator.authenticate(SSOAuthenticator.java:98)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.internal.SSOAuthenticator.authenticate(SSOAuthenticator.java:88)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.WebProviderAuthenticatorProxy.handleSSO(WebProviderAuthenticatorProxy.java:351)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.WebProviderAuthenticatorProxy.authenticateForOtherMechanisms(WebProviderAuthenticatorProxy.java:159)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.WebProviderAuthenticatorProxy.handleJaspi(WebProviderAuthenticatorProxy.java:134)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.WebAppSecurityCollaboratorImpl.handleJaspi(WebAppSecurityCollaboratorImpl.java:742)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.WebAppSecurityCollaboratorImpl.performSecurityChecks(WebAppSecurityCollaboratorImpl.java:686)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.WebAppSecurityCollaboratorImpl.preInvoke(WebAppSecurityCollaboratorImpl.java:622)
c-koell commented 7 months ago

Hi @zhang-ca During last trace both pods had not the same ltpa.keys. Now i have shared the same ltpa-key on both servers and i see no CWWKS4001I but the the Stacktrace looks realy similar... trace_26.01.2024.zip

zhang-ca commented 7 months ago

Instead CWWKS4001I , we got no UserRegistry

[26.01.24, 08:06:11:300 MEZ] 0000002b id=00000000 om.ibm.ws.security.registry.internal.UserRegistryServiceImpl E CWWKS3005E: Es ist eine Konfigurationsausnahme eingetreten. Es ist kein UserRegistry-Implementierungsservice verfügbar. Stellen Sie sicher, dass Sie eine Benutzerregistry konfiguriert haben.

[26.01.24, 08:06:11:331 MEZ] 0000002b id=00000000 com.ibm.ws.logging.internal.impl.IncidentImpl                I FFDC1015I: Es wurde ein FFDC-Vorfall erstellt: "com.ibm.ws.security.registry.RegistryException: CWWKS3005E: Es ist eine Konfigurationsausnahme eingetreten. Es ist kein UserRegistry-Implementierungsservice verfügbar. Stellen Sie sicher, dass Sie eine Benutzerregistry konfiguriert haben. com.ibm.ws.security.authentication.jaas.modules.TokenLoginModule 118" unter ffdc_24.01.26_08.06.11.0.log
krismarc commented 7 months ago

Hi @zhang-ca,

my users are reporting problems with sessions being expired a way earlier than before the update. However, they are not using sessionCache-1 feature. Is it still connected? As I can see, you are doing some changes/tests directly on com.ibm.ws.session.* which makes me think we are around the same problem.

zhang-ca commented 7 months ago

@krismarc was the session timeout or invalidated by security authentication? if you have the logs, can you send to zhang@ca.ibm.com and I can take a quick look. Else you may enable the session trace com.ibm.ws.session.*=all

c-koell commented 7 months ago

@zhang-ca do you have found any hints whats going wrong in version 23.0.0.10 ?

zhang-ca commented 7 months ago

@c-koell so it was the security SSOAuthenticator found the Ltpa token not matching the key on the other pod, and invalidated the session. I found CWWKS4001I happened on both 23.0.0.9 and 23.0.0.10 but the latter one catches the error and stop the subsequent session. Can you ensure the same ltpa keys were used on both pods.

[25.01.24, 07:27:34:650 MEZ] 000000cb id=00000000 com.ibm.ws.security.token.internal.TokenManagerImpl          I CWWKS4001I: Das Sicherheitstoken kann nicht validiert werden. Die folgenden Ursachen sind möglich:
1. Das Sicherheitstoken wurde auf einem anderen Server generiert, der andere Schlüssel verwendet.
2. Die Tokenkonfiguration oder die Sicherheitsschlüssel des Token-Service, der das Token erstellt hat, wurden geändert.
3. Der Token-Service, der das Token erstellt hat, ist nicht mehr verfügbar.

[25.01.24, 07:27:34:650 MEZ] 000000cb id=00000000 SystemOut                                                    O java.lang.Throwable: Throw exception to find out who sent the unauthorized request !!!
[25.01.24, 07:27:34:665 MEZ] 000000cb id=00000000 SystemOut                                                    O  at com.ibm.ws.session.store.memory.MemorySession.invalidate(MemorySession.java:207)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O  at com.ibm.ws.session.http.AbstractHttpSession.invalidate(AbstractHttpSession.java:207)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O  at com.ibm.ws.session.AbstractSessionData.invalidate(AbstractSessionData.java:175)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O  at com.ibm.ws.session.AbstractHttpSessionFacade.invalidate(AbstractHttpSessionFacade.java:166)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O  at com.ibm.ws.webcontainer.security.AuthenticateApi.invalidateSession(AuthenticateApi.java:505)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O  at com.ibm.ws.webcontainer.security.AuthenticateApi.simpleLogout(AuthenticateApi.java:349)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O  at com.ibm.ws.webcontainer.security.internal.SSOAuthenticator.cleanupLoggedOutToken(SSOAuthenticator.java:253)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O  at com.ibm.ws.webcontainer.security.internal.SSOAuthenticator.handleLtpaSSO(SSOAuthenticator.java:196)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O  at com.ibm.ws.webcontainer.security.internal.SSOAuthenticator.handleSSO(SSOAuthenticator.java:139)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O  at com.ibm.ws.webcontainer.security.internal.SSOAuthenticator.authenticate(SSOAuthenticator.java:98)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O  at com.ibm.ws.webcontainer.security.internal.SSOAuthenticator.authenticate(SSOAuthenticator.java:88)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O  at com.ibm.ws.webcontainer.security.WebProviderAuthenticatorProxy.handleSSO(WebProviderAuthenticatorProxy.java:351)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O  at com.ibm.ws.webcontainer.security.WebProviderAuthenticatorProxy.authenticateForOtherMechanisms(WebProviderAuthenticatorProxy.java:159)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O  at com.ibm.ws.webcontainer.security.WebProviderAuthenticatorProxy.handleJaspi(WebProviderAuthenticatorProxy.java:134)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O  at com.ibm.ws.webcontainer.security.WebAppSecurityCollaboratorImpl.handleJaspi(WebAppSecurityCollaboratorImpl.java:742)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O  at com.ibm.ws.webcontainer.security.WebAppSecurityCollaboratorImpl.performSecurityChecks(WebAppSecurityCollaboratorImpl.java:686)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O  at com.ibm.ws.webcontainer.security.WebAppSecurityCollaboratorImpl.preInvoke(WebAppSecurityCollaboratorImpl.java:622)

@c-koell Problem was due to Ltpa token not matching the key on the other pod, either CWWKS4001I or no UserRegistry found. This caused the security check invalidated the session. That is, both 23.0.0.9 and 23.0.0.10 traces you will see CWWKS4001I or CWWKS3005E. I believe this have to do with the TrustedHeaderAuthenticationMechanism implementation. Can you ensure both pods are using the same Ltpa token.

PS. Release 23.0.0.9 passed the security check and didn't invalidated the session was incorrect. The security team fixed the issue in 23.0.0.10 where session will be invalidated if Ltpa token isn't matching the key on the other pod.

Hope this helps, Allan

c-koell commented 7 months ago

Thanks for the info ! Yes during the last trace we had the same ltpa tokens. The result was CWWKS4001I , no UserRegistry.

So how can we fix that problem ? We have no UserRegistry ...

krismarc commented 7 months ago

Hi @zhang-ca,

I'm working on traces collection for you with reproduced problem. We will try with: *=info:com.ibm.ws.session.*=all:com.ibm.ws.webcontainer*=all:com.ibm.wsspi.webcontainer*=all

Please note we run liberty in Cloud Foundry - sort of orchestration like Kubernetes. Therefore, every pod/instance starts using the same image/configuration. Another project reported the same problem. They didn't face any expired session with previous version of liberty.

krismarc commented 7 months ago

one more thing.. what I've found within commits, most likely this one causes our apps to behave differently: https://github.com/OpenLiberty/open-liberty/commit/a748e6b0fd40e77d4b4b56a00f0a2751c25cc6e1 may I ask you to explain what has really changed here and how we should get adopted with our apps? Is my understanding correct that there was a bug/wrong logic which has been fixed?

This also matches the stacktrace you have mentioned here: https://github.com/OpenLiberty/open-liberty/issues/27135#issuecomment-1932505728

krismarc commented 7 months ago

I'm lost a bit. Our configuration remains unchanged since years. We do enforce all our liberties using configDropins overrides concept to keep ltpa tokens valid just for 10 minutes. That was decided by security team years back. <ltpa expiration="10m" keysFileName="${sec.ltpa.keys}" keysPassword="${sec.ltpa.keys.pwd}" />

https://www.ibm.com/docs/en/was-liberty/base?topic=security-authentication#cwlp_authentication__cre_token__title__1

Credentials and tokens As mentioned in the loginModule section, credentials are created as part of the subject creation process. Liberty creates the WSCredential, SingleSignonToken, and WSPrincipal credentials. The SingleSignonToken credential contains the token that is sent back to the browser in a cookie for SSO to work. This token contains the user information and an expiration time. It is signed and encrypted by using the Lightweight Third Party Authentication (LTPA) keys that are generated during the first server startup. The default expiration time is 2 hours and is an absolute time, not based on user activities. After the 2 hours, the token expires and the user must log in again to access the resource.

Accordingly to above quoted text. It seems like this started to work as described just with version 23.0.0.10. Before that version even if we had ltpa token forced to be valid for 10 minutes users were able to keep sessions for longer periods. I thought it's refreshed along with their activity and only if they remain inactive the session gets eventually expired.

..so I'd like to know what's the proper way to handle this now. Shall we increase this expiration time, or there's something else we can do from liberty perspective or there's something what could be done from application code point of view?

zhang-ca commented 7 months ago

This have to do with the LTPA keys validation changes in #26104, let me ask the security team.

https://github.com/OpenLiberty/open-liberty/blob/21601ff8746f05db1f82b4f8da9797534b8baf61/dev/com.ibm.ws.webcontainer.security/src/com/ibm/ws/webcontainer/security/internal/SSOAuthenticator.java#L193

krismarc commented 7 months ago

Thank you! Please note that we are using the same ltpa keys within all our liberties. That's injected by us during applications deployment. Keys remain the same and we are not making any use of dynamic keys rotation feature. Would be nice if we could loop @Zech-Hein into this.

I assume, if we get more production applications deployed with this version more users will start to complain about this. This breaks how our SSO behaved for long time and has a direct impact on end applications users' experience.

That's reported so late, because we are not making application server release for every single Liberty iteration and only those which contain some high scored CVEs trigger us to switch. Eventually if there's a feature demanded by some users. So at the moment we are performing migration from 23.0.0.8 to 23.0.0.12.

Zech-Hein commented 7 months ago

There was actually two bugs around this area that were fixed.

The first you linked above was to fix a problem where after the ltpa.keys were changed on the Liberty server, the LTPA token that users had would no longer be valid and those users would receive a blank page instead of being redirected to the login page. This was because they still had their session cookie, which was valid, because the user had not been logged out, but their LTPA token was no longer valid. As a result, they could no longer access the page.

There was a second bug that affected 23.0.0.9 where even when an LTPA token expired, it was still treated as valid, I think this may explain the behavior around your tokens lasting longer than the expiration setting. This was not fixed until 23.0.0.11 though. In https://github.com/OpenLiberty/open-liberty/pull/26704

Ultimately, with things behaving as they should, a user will be required to login again when their LTPA token expires. To make this longer, set the expiration to a longer duration, like 10 hours. For example, <ltpa expiration="10h" ...

There are no mechanisms currently implemented in Liberty (that I am aware of) that will refresh a users LTPA token automatically without the user performing a login again.

krismarc commented 7 months ago

Thanks for your answer @Zech-Hein . This would really explain the situation. However, as I've stated, we are migrating from 23.0.0.8 to 23.0.0.12 and we didn't had any issues like that anytime before this migration and you assume this was a case only between 23.0.0.9 and 23.0.0.11, is my understanding correct? Also nobody reported any issues with randomly appearing blank pages before we started the migration. So something's yet fishy here.

c-koell commented 7 months ago

@Zech-Hein thanks for the info !

So i tried now with the same ltpa tokens on both servers and with the newest liberty version 24.0.0.1. but the SessionReplication still not work !

I have attached the traces 12.02.2024.zip

@zhang-ca also for your info ..

zhang-ca commented 7 months ago

@c-koell so it was the security SSOAuthenticator found the Ltpa token not matching the key on the other pod, and invalidated the session. I found CWWKS4001I happened on both 23.0.0.9 and 23.0.0.10 but the latter one catches the error and stop the subsequent session. Can you ensure the same ltpa keys were used on both pods.

[25.01.24, 07:27:34:650 MEZ] 000000cb id=00000000 com.ibm.ws.security.token.internal.TokenManagerImpl          I CWWKS4001I: Das Sicherheitstoken kann nicht validiert werden. Die folgenden Ursachen sind möglich:
1. Das Sicherheitstoken wurde auf einem anderen Server generiert, der andere Schlüssel verwendet.
2. Die Tokenkonfiguration oder die Sicherheitsschlüssel des Token-Service, der das Token erstellt hat, wurden geändert.
3. Der Token-Service, der das Token erstellt hat, ist nicht mehr verfügbar.

[25.01.24, 07:27:34:650 MEZ] 000000cb id=00000000 SystemOut                                                    O java.lang.Throwable: Throw exception to find out who sent the unauthorized request !!!
[25.01.24, 07:27:34:665 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.session.store.memory.MemorySession.invalidate(MemorySession.java:207)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.session.http.AbstractHttpSession.invalidate(AbstractHttpSession.java:207)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.session.AbstractSessionData.invalidate(AbstractSessionData.java:175)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.session.AbstractHttpSessionFacade.invalidate(AbstractHttpSessionFacade.java:166)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.AuthenticateApi.invalidateSession(AuthenticateApi.java:505)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.AuthenticateApi.simpleLogout(AuthenticateApi.java:349)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.internal.SSOAuthenticator.cleanupLoggedOutToken(SSOAuthenticator.java:253)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.internal.SSOAuthenticator.handleLtpaSSO(SSOAuthenticator.java:196)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.internal.SSOAuthenticator.handleSSO(SSOAuthenticator.java:139)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.internal.SSOAuthenticator.authenticate(SSOAuthenticator.java:98)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.internal.SSOAuthenticator.authenticate(SSOAuthenticator.java:88)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.WebProviderAuthenticatorProxy.handleSSO(WebProviderAuthenticatorProxy.java:351)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.WebProviderAuthenticatorProxy.authenticateForOtherMechanisms(WebProviderAuthenticatorProxy.java:159)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.WebProviderAuthenticatorProxy.handleJaspi(WebProviderAuthenticatorProxy.java:134)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.WebAppSecurityCollaboratorImpl.handleJaspi(WebAppSecurityCollaboratorImpl.java:742)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.WebAppSecurityCollaboratorImpl.performSecurityChecks(WebAppSecurityCollaboratorImpl.java:686)
[25.01.24, 07:27:34:743 MEZ] 000000cb id=00000000 SystemOut                                                    O    at com.ibm.ws.webcontainer.security.WebAppSecurityCollaboratorImpl.preInvoke(WebAppSecurityCollaboratorImpl.java:622)

@c-koell Problem was due to Ltpa token not matching the key on the other pod, either CWWKS4001I or no UserRegistry found. This caused the security check invalidated the session. That is, both 23.0.0.9 and 23.0.0.10 traces you will see CWWKS4001I or CWWKS3005E. I believe this have to do with the TrustedHeaderAuthenticationMechanism implementation. Can you ensure both pods are using the same Ltpa token.

PS. Release 23.0.0.9 passed the security check and didn't invalidated the session was incorrect. The security team fixed the issue in 23.0.0.10 where session will be invalidated if Ltpa token isn't matching the key on the other pod.

Hope this helps, Allan

Hi @c-koell the latest trace (12.02.2024.zip) showed the same exception CWWKS3005E I mentioned before. It was caused by No UserRegistry deployment service is available.

[12.02.24, 07:59:11:748 MEZ] 00000036 id=00000000 om.ibm.ws.security.registry.internal.UserRegistryServiceImpl E CWWKS3005E: Es ist eine Konfigurationsausnahme eingetreten. Es ist kein UserRegistry-Implementierungsservice verfügbar. Stellen Sie sicher, dass Sie eine Benutzerregistry konfiguriert haben.
[12.02.24, 07:59:11:763 MEZ] 00000036 id=00000000 com.ibm.ws.logging.internal.impl.IncidentImpl                I FFDC1015I: Es wurde ein FFDC-Vorfall erstellt: "com.ibm.ws.security.registry.RegistryException: CWWKS3005E: Es ist eine Konfigurationsausnahme eingetreten. Es ist kein UserRegistry-Implementierungsservice verfügbar. Stellen Sie sicher, dass Sie eine Benutzerregistry konfiguriert haben. com.ibm.ws.security.authentication.jaas.modules.TokenLoginModule 118" unter ffdc_24.02.12_07.59.11.0.log

The SessionReplication was interrupted by the AuthenticationException below, where it called cleanupLoggedOutToken() and invalidated the session. This was the first fix mentioned in https://github.com/OpenLiberty/open-liberty/issues/27135#issuecomment-1936452454 which was integrated in 23.0.0.10.

https://github.com/OpenLiberty/open-liberty/blob/21601ff8746f05db1f82b4f8da9797534b8baf61/dev/com.ibm.ws.webcontainer.security/src/com/ibm/ws/webcontainer/security/internal/SSOAuthenticator.java#L188

The AuthenticationException came from your own implemented HttpAuthenticationMechanism. In this case there was no UserRegistry implemented. Please make sure you have configured a user registry on the failover pod.

Hope this helps, Allan

c-koell commented 7 months ago

HI @zhang-ca

I'm really sure that we have used the same ltpa keys on both servers. I don't think that our HttpAuthenticationMechanism throws a AuthenticationException but i will check this...

We have annotated on the HttpAuthenticationMechanism javax.security.enterprise.authentication.mechanism.http.AutoApplySession what means that the login information should be bound to the session.

So every subsequent request does not run through the HttpAuthenticationMechanism. This means if a requset goes to ServerB the HttpAuthenticationMechanism should not be called.

For your information we don't use a user registry.

c-koell commented 7 months ago

I have checked it and we throw definitivle no exception.

zhang-ca commented 7 months ago

I will have the security team to review the stack:

[26.01.24, 08:06:11:300 MEZ] 0000002b om.ibm.ws.security.registry.internal.UserRegistryServiceImpl E CWWKS3005E:...
...
[26.01.24, 08:06:11:472 MEZ] 0000002b SystemOut                                                    O    at com.ibm.ws.webcontainer.security.AuthenticateApi.invalidateSession(AuthenticateApi.java:505)
[26.01.24, 08:06:11:472 MEZ] 0000002b SystemOut                                                    O    at com.ibm.ws.webcontainer.security.AuthenticateApi.simpleLogout(AuthenticateApi.java:349)
[26.01.24, 08:06:11:472 MEZ] 0000002b SystemOut                                                    O    at com.ibm.ws.webcontainer.security.internal.SSOAuthenticator.cleanupLoggedOutToken(SSOAuthenticator.java:253)
[26.01.24, 08:06:11:472 MEZ] 0000002b SystemOut                                                    O    at com.ibm.ws.webcontainer.security.internal.SSOAuthenticator.handleLtpaSSO(SSOAuthenticator.java:196)
[26.01.24, 08:06:11:472 MEZ] 0000002b SystemOut                                                    O    at com.ibm.ws.webcontainer.security.internal.SSOAuthenticator.handleSSO(SSOAuthenticator.java:139)
[26.01.24, 08:06:11:472 MEZ] 0000002b SystemOut                                                    O    at com.ibm.ws.webcontainer.security.internal.SSOAuthenticator.authenticate(SSOAuthenticator.java:98)
[26.01.24, 08:06:11:472 MEZ] 0000002b SystemOut                                                    O    at com.ibm.ws.webcontainer.security.internal.SSOAuthenticator.authenticate(SSOAuthenticator.java:88)
[26.01.24, 08:06:11:472 MEZ] 0000002b SystemOut                                                    O    at com.ibm.ws.webcontainer.security.WebProviderAuthenticatorProxy.handleSSO(WebProviderAuthenticatorProxy.java:351)
[26.01.24, 08:06:11:472 MEZ] 0000002b SystemOut                                                    O    at com.ibm.ws.webcontainer.security.WebProviderAuthenticatorProxy.authenticateForOtherMechanisms(WebProviderAuthenticatorProxy.java:159)
[26.01.24, 08:06:11:472 MEZ] 0000002b SystemOut                                                    O    at com.ibm.ws.webcontainer.security.WebProviderAuthenticatorProxy.handleJaspi(WebProviderAuthenticatorProxy.java:134)
[26.01.24, 08:06:11:472 MEZ] 0000002b SystemOut                                                    O    at com.ibm.ws.webcontainer.security.WebAppSecurityCollaboratorImpl.handleJaspi(WebAppSecurityCollaboratorImpl.java:742)
[26.01.24, 08:06:11:472 MEZ] 0000002b SystemOut                                                    O    at com.ibm.ws.webcontainer.security.WebAppSecurityCollaboratorImpl.performSecurityChecks(WebAppSecurityCollaboratorImpl.java:686)
Zech-Hein commented 7 months ago

Hi @c-koell ,

We have an idea of something for you to try out, Before making the browser request where the problem happens, could you try removing/deleting the LTPAToken2 cookie from the browser, to see if the problem still happens?

As you mentioned above the login info should be bound to the session, so you do not want to use the ltpa token it seems.

We're working on making a patch that should fix your issue, I'll email it to you

c-koell commented 7 months ago

Hi @Zech-Hein

I have tried now to delete the LTPAToken2 cookie and then the session replication works fine. The second server has the same values in the session.

Unfortunately the ZIP File from your mail was blocked by IBM policy

c-koell commented 7 months ago

Here are the traces when i delete the LTPAToken2 cookie 14.02.2024.zip

c-koell commented 7 months ago

@Zech-Hein i have tried with the com.ibm.ws.webcontainer.security_1.0.85.jar but it does not work ..

Here are the traces 15.02.2024.zip

krismarc commented 6 months ago

@Zech-Hein, does it mean my problem has chance to be fixed as well by this? https://github.com/OpenLiberty/open-liberty/pull/27655

Zech-Hein commented 6 months ago

@c-koell thank you for trying that patch, we decided to fix it by reverted the change from https://github.com/OpenLiberty/open-liberty/pull/26704 instead.

@krismarc Yes, it should fix your problem.

corporate-gadfly commented 6 months ago

Just chiming in here.

This hit us when we bumped from 23.0.0.9 to 23.0.0.10 (presumably because we use LtpaToken and a custom TAI). Downgrading to 23.0.0.9 fixed the issue.

Looking forward to trying out the next version with the fix.

Zech-Hein commented 6 months ago

This fix #27655 should be available in 24.0.0.3