SolaceProducts / solace-spring-cloud

An umbrella project containing all Solace projects for Spring Cloud
Apache License 2.0
22 stars 15 forks source link

[SOL-91405] Exception in the output channel when the application scales down #200

Closed murillocg closed 1 year ago

murillocg commented 1 year ago

I have some applications in my event-oriented architecture which have an input channel as well as an output channel to respond the request. So, the complete flow usually involves two channels.

I've noticed when the application has 2 or more instances live and scales down, this exception occurs in the application that is stopping (shutdown), in the output channel:

org.springframework.messaging.MessagingException: Unable to send message to topic gov-connector/v1/batch-service-authorization-response; nested exception is com.solacesystems.jcsmp.ClosedFacilityException: Tried to perform operation on a closed XML message producer, failedMessage=GenericMessage [payload=byte[562], headers={authorization=xxxx, replyTo=connector/v1/batch-service-authorization-response, tenantId=yyyyy, correlationId=679dfe40-4ff6-499e-76ee-d5d719502938, id=9a1c28d5-de3f-e15a-7474-e9df551339e3, scst_targetDestination=connector/v1/batch-service-authorization-response, contentType=application/json, timestamp=1678290352861}]
    at com.solace.spring.cloud.stream.binder.util.ErrorChannelSendingCorrelationKey.send(ErrorChannelSendingCorrelationKey.java:57)
    at com.solace.spring.cloud.stream.binder.outbound.JCSMPOutboundMessageHandler.handleMessagingException(JCSMPOutboundMessageHandler.java:142)
    at com.solace.spring.cloud.stream.binder.outbound.JCSMPOutboundMessageHandler.handleMessage(JCSMPOutboundMessageHandler.java:98)
    at org.springframework.cloud.stream.binder.AbstractMessageChannelBinder$SendingHandler.handleMessageInternal(AbstractMessageChannelBinder.java:1035)
    at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:56)
    at org.springframework.integration.dispatcher.AbstractDispatcher.tryOptimizedDispatch(AbstractDispatcher.java:115)
    at org.springframework.integration.dispatcher.UnicastingDispatcher.doDispatch(UnicastingDispatcher.java:133)

In the shutdown scenario, it seems the Solace implementation of the spring cloud stream binder is still trying to reconnect and rebind, instead of stopping its workers smoothly:

Mar 8, 2023 @ 12:45:53.294 | Stopping flow receiver container 1bbfe546-843c-4d89-8766-6787e55dfc64 | com.solace.spring.cloud.stream.binder.util.FlowReceiverContainer | - | - | 1 | gov-connector
Mar 8, 2023 @ 12:45:53.294 | Waiting for 0 items, time remaining: 10000 MILLISECONDS | com.solace.spring.cloud.stream.binder.util.UnsignedCounterBarrier | - | - | 1 | gov-connector
Mar 8, 2023 @ 12:45:53.294 | Binding flow receiver container 1bbfe546-843c-4d89-8766-6787e55dfc64 | com.solace.spring.cloud.stream.binder.util.FlowReceiverContainer | - | - | 1 | gov-connector
Mar 8, 2023 @ 12:45:53.294 | failed to rebind queue gov-connector.v1.send-to-gov and flow container 1bbfe546-843c-4d89-8766-6787e55dfc64 is now unbound. Attempting to bind. | com.solace.spring.cloud.stream.binder.util.RetryableAckRebindTask | - | - | 1 | gov-connector
Mar 8, 2023 @ 12:45:53.294 | Flow receiver container 1bbfe546-843c-4d89-8766-6787e55dfc64 started in state 'Running' | com.solace.spring.cloud.stream.binder.util.FlowReceiverContainer | - | - | 1 | gov-connector
Mar 8, 2023 @ 12:45:53.294 | Rebinding flow receiver container 1bbfe546-843c-4d89-8766-6787e55dfc64 | com.solace.spring.cloud.stream.binder.util.FlowReceiverContainer | - | - | 1 | gov-connector
Mar 8, 2023 @ 12:45:53.294 | Unbinding flow receiver container 1bbfe546-843c-4d89-8766-6787e55dfc64 | com.solace.spring.cloud.stream.binder.util.FlowReceiverContainer | - | - | 1 | gov-connector
Mar 8, 2023 @ 12:45:53.292 | Exception thrown while processing XMLMessage 45. Message will be rejected. | com.solace.spring.cloud.stream.binder.inbound.InboundXMLMessageListener | - | - | 1 | gov-connector
Mar 8, 2023 @ 12:45:53.291 | Interrupting all workers for inbound adapter 9b129a0b-d8bb-4916-8344-1774ed868da9 | com.solace.spring.cloud.stream.binder.inbound.JCSMPInboundChannelAdapter | - | - | 1 | gov-connector
Mar 8, 2023 @ 12:45:52.868 | Failed to consume a message from destination gov-connector.v1.send-to-gov - attempt 1 | com.solace.spring.cloud.stream.binder.inbound.JCSMPInboundChannelAdapter

We are migrating from Rabbit to Solace and this case did not happen so far. So, it's kind of new for us. Maybe, there is something related to the order of workers (channel threads) shutdown or some configuration is missing.

We are using spring-cloud-starter-stream-solace:jar:3.3.1 with @StreamListener for input channels and @SendTo for output channels.

Nephery commented 1 year ago

how are you doing the "scale down"? Are you stopping a binding from its own message handler?

murillocg commented 1 year ago

The platform only stops one of the multiple instances running. I'm not stopping a binding.

Mrc0113 commented 1 year ago

Hi @murillocg,

Can you clarify what you mean by "multiple instances running". Do you have multiple instances of the Spring Boot microservice running in separate containers and you are scaling up/down using K8s or do you mean scaling something else up/down within a single Spring Boot app?

Also, for awareness, I know you're migrating existing apps but if you are allowed to modify them you might want to consider moving to using Spring Cloud Functions vs. the @StreamListener and @SendTo annotations which have been deprecated.

murillocg commented 1 year ago

Hello @Mrc0113

Yes. I have multiple instances of the Spring Boot ms running in separate containers and I'm scaling up/down using the SAP BTP Cloud Foundry platform.

When scaling down, as explained here, on shutdown, Cloud Foundry sends the app process a SIGTERM, giving the application 10 seconds to stop on its own before being forcibly terminated via SIGKILL.

Mrc0113 commented 1 year ago

Hi @murillocg,

Interesting - thanks for the further info. I was unable to reproduce it myself. I tried a kill -15 <pid> which should simulate a SIGTERM of a Cloud Stream app using v2.3.1 of the Solace Spring BOM, which includes v3.3.1 of the binder at got the below which is what I would expect (note I had 5 concurrent consumers which is why there are multiple flows being closed). Can you try to kill your app locally using kill -15 <pid> to see if it works for you? (Sorry I don't currently have access to a Cloud Foundry instance to test)

2023-03-14 12:36:31.782  INFO 62606 --- [ionShutdownHook] c.s.s.c.s.b.i.JCSMPInboundChannelAdapter : Stopping all 5 consumer flows to queue scst/wk/SINK/plain/TEMPS.Q <inbound adapter ID: 4b2ceb32-402b-4455-be18-80f4c63ad8e5>
2023-03-14 12:36:32.291  INFO 62606 --- [ionShutdownHook] c.s.s.c.s.b.i.JCSMPInboundChannelAdapter : Interrupting all workers for inbound adapter 4b2ceb32-402b-4455-be18-80f4c63ad8e5
2023-03-14 12:36:32.292  INFO 62606 --- [pool-3-thread-2] c.s.s.c.s.b.i.InboundXMLMessageListener  : Closing flow receiver to destination scst/wk/SINK/plain/TEMPS.Q
2023-03-14 12:36:32.293  INFO 62606 --- [pool-3-thread-5] c.s.s.c.s.b.i.InboundXMLMessageListener  : Closing flow receiver to destination scst/wk/SINK/plain/TEMPS.Q
2023-03-14 12:36:32.295  INFO 62606 --- [pool-3-thread-3] c.s.s.c.s.b.i.InboundXMLMessageListener  : Closing flow receiver to destination scst/wk/SINK/plain/TEMPS.Q
2023-03-14 12:36:32.293  INFO 62606 --- [pool-3-thread-1] c.s.s.c.s.b.i.InboundXMLMessageListener  : Closing flow receiver to destination scst/wk/SINK/plain/TEMPS.Q
2023-03-14 12:36:32.295  INFO 62606 --- [pool-3-thread-2] c.s.s.c.s.b.util.FlowReceiverContainer   : Unbinding flow receiver container 2b09d00e-60c6-4d80-aae6-0d9ba7030968
2023-03-14 12:36:32.292  INFO 62606 --- [pool-3-thread-4] c.s.s.c.s.b.i.InboundXMLMessageListener  : Closing flow receiver to destination scst/wk/SINK/plain/TEMPS.Q
2023-03-14 12:36:32.295  INFO 62606 --- [pool-3-thread-4] c.s.s.c.s.b.util.FlowReceiverContainer   : Unbinding flow receiver container 96b0d74c-2880-462e-91f6-4e9ab525e9c2
2023-03-14 12:36:32.296  INFO 62606 --- [pool-3-thread-5] c.s.s.c.s.b.util.FlowReceiverContainer   : Unbinding flow receiver container 43124826-3769-42c1-88c7-f4ec6b501ae1
2023-03-14 12:36:32.296  INFO 62606 --- [pool-3-thread-3] c.s.s.c.s.b.util.FlowReceiverContainer   : Unbinding flow receiver container 2e0a95f1-fa59-40d2-be2d-d928a784acd7
2023-03-14 12:36:32.296  INFO 62606 --- [pool-3-thread-1] c.s.s.c.s.b.util.FlowReceiverContainer   : Unbinding flow receiver container 0156fc5d-51b4-4aa1-b969-3e8d9cdb6b07
2023-03-14 12:36:32.306  INFO 62606 --- [ionShutdownHook] c.s.s.c.s.b.i.JCSMPInboundChannelAdapter : stopped com.solace.spring.cloud.stream.binder.inbound.JCSMPInboundChannelAdapter@71039ce1
2023-03-14 12:36:32.313  INFO 62606 --- [ionShutdownHook] o.s.c.stream.binder.BinderErrorChannel   : Channel 'application.TEMPS.Q.SINK.errors' has 1 subscriber(s).
2023-03-14 12:36:32.314  INFO 62606 --- [ionShutdownHook] o.s.i.endpoint.EventDrivenConsumer       : Removing {logging-channel-adapter:_org.springframework.integration.errorLogger} as a subscriber to the 'errorChannel' channel
2023-03-14 12:36:32.314  INFO 62606 --- [ionShutdownHook] o.s.i.channel.PublishSubscribeChannel    : Channel 'application.errorChannel' has 0 subscriber(s).
2023-03-14 12:36:32.315  INFO 62606 --- [ionShutdownHook] o.s.i.endpoint.EventDrivenConsumer       : stopped bean '_org.springframework.integration.errorLogger'
2023-03-14 12:36:32.318  INFO 62606 --- [ionShutdownHook] c.s.s.c.s.b.SolaceMessageChannelBinder   : Closing JCSMP session JCSMPSession
2023-03-14 12:36:32.319  INFO 62606 --- [ionShutdownHook] c.s.j.protocol.impl.TcpClientChannel     : Client-1: Channel Closed (smfclient 1)
murillocg commented 1 year ago

Hello @Mrc0113

I believe you didn't create an output channel to answer the input channel messages. Beyond that, the load in our case is very high.. that's why we have 50 concurrent consumers, not 5.

Nephery commented 1 year ago

Closing this issue. This was a Spring Boot/Cloud compatibility issue.