OpenLiberty / open-liberty

Open Liberty is a highly composable, fast to start, dynamic application server runtime environment
https://openliberty.io
Eclipse Public License 2.0
1.15k stars 592 forks source link

WS-AT participant could not be registered as TM is null in HA environment with a load balancer #27885

Open tuomin35 opened 8 months ago

tuomin35 commented 8 months ago

Describe the bug

Transaction manager is null if transaction coordination request hits a server instance that has not yet been involved in any transaction.

This may happen when a load balancer directs the request to a server that did not create the transaction. Transaction manager may be unintialized.

Transaction manager is however initialized properly if recoverOnStartup parameter is true.

[3/6/24, 16:52:21:278 EET] 0000003b ProtocolServi >  registrationRegister Entry  
[3/6/24, 16:52:21:278 EET] 0000003b RegistrationI >  register Entry  
[3/6/24, 16:52:21:278 EET] 0000003b RegistrationI >  rerouteRegistration Entry  
[3/6/24, 16:52:21:278 EET] 0000003b RegistrationI 3   REROUTE REGISTRATION Originally sent to: http://foo.bar.company.tld/ibm/wsatservice/RegistrationService
[3/6/24, 16:52:21:278 EET] 0000003b TranManagerIm >  getAddress Entry  
[3/6/24, 16:52:21:278 EET] 0000003b RemoteTransac >  getAddress Entry  
[3/6/24, 16:52:21:553 EET] 0000003b IncidentImpl  I   FFDC1015I: An FFDC Incident has been created: "java.lang.NullPointerException: Cannot invoke "com.ibm.ws.recoverylog.spi.RecoveryLogManager.getLeaseLog()" because the return value of "com.ibm.ws.Transaction.JTS.Configuration.getLogManager()" is null com.ibm.ws.wsat.service.impl.RegistrationImpl 191" at ffdc_24.03.06_16.52.21.0.log
[3/6/24, 16:52:21:556 EET] 0000003b RegistrationI 3   Cant get address for {0} {1}
[3/6/24, 16:52:21:556 EET] 0000003b RegistrationI <  rerouteRegistration Exit
.
.
.
[3/6/24, 16:52:22:071 EET] 0000003b IncidentImpl  I   FFDC1015I: An FFDC Incident has been created: "com.ibm.ws.wsat.service.WSATFaultException: Participant could not be registered com.ibm.ws.wsat.utils.WSATRequestHandler 74" at ffdc_24.03.06_16.52.22.0.log

Steps to Reproduce
Start two OpenLiberty servers with recoverOnStartup=false and use load balancer address for externalURLPrefix. Alternatively define server Alfa for server Bravo and vice versa if no load balancer is available. Trigger an outbound SOAP call and expect a WS-AT coordination request hit the other server instance.

Expected behavior
Transaction manager instance should be initialized when a WS-AT coordination request hits the server.

Alternatively the documentation could be improved to cover "conditionally required" parameters. As in this case backendURL and recoverOnStartup parameters are connected in HA environment.

Diagnostic information:

Additional information

At first I had the recoverOnStartup set as true, but I had problems with hung threads. I suspected this could be as I had not defined the backendURL parameter and it defaulted to localhost. This probably generated some kind of loop in peer recovery. This led to OutOfMemory errors. Server crashed and was rebooted, but got hung up all over again.

CWWKE1200W:  All threads in the Liberty default executor appear to be hung.  Liberty automatically increased the number of threads from 1,141 to 1,141.  However, all threads still appear to be hung. [2024-03-04T04:16:55.258+0200]
CWWKE1200W:  All threads in the Liberty default executor appear to be hung.  Liberty automatically increased the number of threads from 1,125 to 1,125.  However, all threads still appear to be hung. [2024-03-04T04:52:19.572+0200]
CWWKE1200W:  All threads in the Liberty default executor appear to be hung.  Liberty automatically increased the number of threads from 1,093 to 1,093.  However, all threads still appear to be hung. [2024-03-04T06:20:58.960+0200]

I had no "All threads ... appear to be hung" errors after changing recoverOnStartup to false.

jonhawkes commented 8 months ago

This is a bug. I will implement the expected behavior as described .