RestComm / jss7

RestComm Java SS7 Stack and Services
http://www.restcomm.com/
GNU Affero General Public License v3.0
192 stars 224 forks source link

M3UA State Machine (FSM) states race during stack stop #308

Open TimurKasatkin opened 5 years ago

TimurKasatkin commented 5 years ago

Is this a BUG REPORT or FEATURE REQUEST?: /kind bug

What happened: When stopping M3UA stack on CLIENT side (Functionality = IPSP, ExchangeType = SE, IPSPType = CLIENT) the following error appears in application logs:

org.mobicents.protocols.ss7.m3ua.impl.fsm.UnknownTransitionException: Transition=aspdownack. FSM.name=ASP1_LOCAL old state=ACTIVE, current state=ACTIVE
    at org.mobicents.protocols.ss7.m3ua.impl.fsm.FSMState.signal(FSMState.java:130)
    at org.mobicents.protocols.ss7.m3ua.impl.fsm.FSM.signal(FSM.java:152)
    at org.mobicents.protocols.ss7.m3ua.impl.AspStateMaintenanceHandler.handleAspDownAck(AspStateMaintenanceHandler.java:278)
    at org.mobicents.protocols.ss7.m3ua.impl.AspFactoryImpl.read(AspFactoryImpl.java:415)
    at org.mobicents.protocols.ss7.m3ua.impl.AspFactoryImpl.processPayload(AspFactoryImpl.java:826)
    at org.mobicents.protocols.ss7.m3ua.impl.AspFactoryImpl.onPayload(AspFactoryImpl.java:805)
    at org.mobicents.protocols.sctp.netty.NettyAssociationImpl.read(NettyAssociationImpl.java:505)
    at org.mobicents.protocols.sctp.netty.NettySctpChannelInboundHandlerAdapter.channelRead(NettySctpChannelInboundHandlerAdapter.java:195)

After some research it has been discovered that when stopping M3UA stack on CLIENT side using M3UAManagementImpl#stop() the problem described below arises during AspFactory stop after ASP Down message is sent to switch device (in our case, simulator running locally) https://github.com/RestComm/jss7/blob/402b4b3984e581c581e309cb5f6a858fde6dbc33/m3ua/impl/src/main/java/org/restcomm/protocols/ss7/m3ua/impl/AspFactoryImpl.java#L215-L218

If ASP Down Ack response message from switch device (simulator) arrives before local FSM is moved to state AspState.DOWN_SENT using signal TransitionState.ASP_DOWN_SENT (it happens just after ASP Down message sending) in the following code

https://github.com/RestComm/jss7/blob/402b4b3984e581c581e309cb5f6a858fde6dbc33/m3ua/impl/src/main/java/org/restcomm/protocols/ss7/m3ua/impl/AspFactoryImpl.java#L218-L224

then UnknownTransitionException occures due to trying to signal TransitionState.ASP_DOWN_ACK (move local FSM to AspState.DOWN state) in the following code

https://github.com/RestComm/jss7/blob/402b4b3984e581c581e309cb5f6a858fde6dbc33/m3ua/impl/src/main/java/org/restcomm/protocols/ss7/m3ua/impl/AspStateMaintenanceHandler.java#L277-L281

while local FSM is still in AspState.ACTIVE state (TransitionState.ASP_DOWN_ACK transition expects that local FSM is already in AspState.DOWN_SENT state).

So, there is a race between FSM states due to expecting that after ASP Down message sent local FSM will be moved to AspState.DOWN_SENT before ASP Down Ack message arived.

How to reproduce it (as minimally and precisely as possible): The simplest way to reproduce the problem is to create a synthetic delay after code line below: https://github.com/RestComm/jss7/blob/402b4b3984e581c581e309cb5f6a858fde6dbc33/m3ua/impl/src/main/java/org/restcomm/protocols/ss7/m3ua/impl/AspFactoryImpl.java#L218 either by adding something like Thread.sleep(...) or just stopping at the next code line in a debug mode.

Anything else we need to know?: Just to repeat - the problem has been arised with running locally simulator (thanks to this, it was possible to create such a small delay between ASP Down sent and ASP Down Ack arrived).

Environment:

alpercoskun commented 4 years ago

A similar problem is occurred in a real world scenario:

2020.09.17 02:11:11:268 ERROR [Sctp-BossGroup-jss7-1-1] (AspTrafficMaintenanceHandler:448) - Transition=aspinactiveack. FSM.name=stp_1001_asp3_LOCAL old state=ACTIVE, current state=ACTIVE 2020.09.17 02:11:11:269 org.mobicents.protocols.ss7.m3ua.impl.fsm.UnknownTransitionException: Transition=aspinactiveack. FSM.name=stp_1001_asp3_LOCAL old state=ACTIVE, current state=ACTIVE 2020.09.17 02:11:11:269 at org.mobicents.protocols.ss7.m3ua.impl.fsm.FSMState.signal(FSMState.java:131) 2020.09.17 02:11:11:270 at org.mobicents.protocols.ss7.m3ua.impl.fsm.FSM.signal(FSM.java:153) 2020.09.17 02:11:11:270 at org.mobicents.protocols.ss7.m3ua.impl.AspTrafficMaintenanceHandler.handleAspInactiveAck(AspTrafficMaintenanceHandler.java:414) 2020.09.17 02:11:11:270 at org.mobicents.protocols.ss7.m3ua.impl.AspTrafficMaintenanceHandler.handleAspInactiveAck(AspTrafficMaintenanceHandler.java:369) 2020.09.17 02:11:11:271 at org.mobicents.protocols.ss7.m3ua.impl.AspFactoryImpl.read(AspFactoryImpl.java:442) 2020.09.17 02:11:11:271 at org.mobicents.protocols.ss7.m3ua.impl.AspFactoryImpl.processPayload(AspFactoryImpl.java:787) 2020.09.17 02:11:11:271 at org.mobicents.protocols.ss7.m3ua.impl.AspFactoryImpl.onPayload(AspFactoryImpl.java:766) 2020.09.17 02:11:11:271 at org.mobicents.protocols.sctp.netty.NettyAssociationImpl.read(NettyAssociationImpl.java:477) 2020.09.17 02:11:11:272 at org.mobicents.protocols.sctp.netty.NettySctpChannelInboundHandlerAdapter.channelRead(NettySctpChannelInboundHandlerAdapter.java:192) 2020.09.17 02:11:11:272 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) 2020.09.17 02:11:11:272 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) 2020.09.17 02:11:11:273 at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) 2020.09.17 02:11:11:273 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) 2020.09.17 02:11:11:273 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) 2020.09.17 02:11:11:274 at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) 2020.09.17 02:11:11:274 at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:93) 2020.09.17 02:11:11:274 at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 2020.09.17 02:11:11:275 at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) 2020.09.17 02:11:11:275 at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 2020.09.17 02:11:11:275 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 2020.09.17 02:11:11:275 at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) 2020.09.17 02:11:11:276 at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) 2020.09.17 02:11:11:276 at java.lang.Thread.run(Thread.java:748)


2020.09.17 02:17:46:421 ERROR [Sctp-BossGroup-jss7-1-1] (ManagementMessageHandler:80) - Transition=ntfyaspending. FSM.name=stp_1001_as_PEER old state=PENDING, current state=PENDING 2020.09.17 02:17:46:421 org.mobicents.protocols.ss7.m3ua.impl.fsm.UnknownTransitionException: Transition=ntfyaspending. FSM.name=stp_1001_as_PEER old state=PENDING, current state=PENDING 2020.09.17 02:17:46:422 at org.mobicents.protocols.ss7.m3ua.impl.fsm.FSMState.signal(FSMState.java:131) 2020.09.17 02:17:46:422 at org.mobicents.protocols.ss7.m3ua.impl.fsm.FSM.signal(FSM.java:153) 2020.09.17 02:17:46:422 at org.mobicents.protocols.ss7.m3ua.impl.ManagementMessageHandler.handleNotify(ManagementMessageHandler.java:78) 2020.09.17 02:17:46:423 at org.mobicents.protocols.ss7.m3ua.impl.AspFactoryImpl.read(AspFactoryImpl.java:338) 2020.09.17 02:17:46:423 at org.mobicents.protocols.ss7.m3ua.impl.AspFactoryImpl.processPayload(AspFactoryImpl.java:787) 2020.09.17 02:17:46:423 at org.mobicents.protocols.ss7.m3ua.impl.AspFactoryImpl.onPayload(AspFactoryImpl.java:766) 2020.09.17 02:17:46:423 at org.mobicents.protocols.sctp.netty.NettyAssociationImpl.read(NettyAssociationImpl.java:477) 2020.09.17 02:17:46:424 at org.mobicents.protocols.sctp.netty.NettySctpChannelInboundHandlerAdapter.channelRead(NettySctpChannelInboundHandlerAdapter.java:192) 2020.09.17 02:17:46:424 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) 2020.09.17 02:17:46:424 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) 2020.09.17 02:17:46:425 at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) 2020.09.17 02:17:46:425 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) 2020.09.17 02:17:46:425 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) 2020.09.17 02:17:46:426 at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) 2020.09.17 02:17:46:426 at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:93) 2020.09.17 02:17:46:426 at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 2020.09.17 02:17:46:426 at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) 2020.09.17 02:17:46:427 at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 2020.09.17 02:17:46:427 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 2020.09.17 02:17:46:427 at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) 2020.09.17 02:17:46:427 at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) 2020.09.17 02:17:46:428 at java.lang.Thread.run(Thread.java:748) 2020.09.17 02:17:48:688 WARN [pool-6-thread-1] (AsStatePenTimeout:74) - PENDING timedout for As=stp_1001_as


m3ua as show

AS name=stp_1001_as functionality=IPSP mode=SE ipspType=CLIENT trMode=2 defaultTrMode=2 peerFSMState=INACTIVE Assigned to : ASP name=stp_1001_asp3 started=true ASP name=stp_1001_asp4 started=true

m3ua asp show

ASP name=stp_1001_asp3 aspid=4 heartbeat=false sctpAssoc=stp_1001_assoc3 started=true Assigned to : AS name=stp_1001_as functionality=IPSP mode=SE ipspType=CLIENT localFSMState=ACTIVE_SENT

ASP name=stp_1001_asp4 aspid=5 heartbeat=false sctpAssoc=stp_1001_assoc4 started=true Assigned to : AS name=stp_1001_as functionality=IPSP mode=SE ipspType=CLIENT localFSMState=ACTIVE_SENT