apache / plc4x

PLC4X The Industrial IoT adapter
https://plc4x.apache.org/
Apache License 2.0
1.24k stars 399 forks source link

[Serial-Transport] NettyChannelFactory.closeEventLoopForChannel never returns #646

Closed ottlukas closed 1 month ago

ottlukas commented 1 year ago

I'm polling a modbus device over a RS485 serial connection:


... connectionString":"modbus-rtu:serial:///dev/ttyAMA2?BAUD_RATE=19200&STOP_BITS=1&PARITY=EVEN&unit-identifier=1"

...
"sources. ... .pollReturnInterval": 5000, 
"jobs. ... .interval": 2000,

It works for a few minutes and then:


[2022-10-28 17:20:50,239] WARN [conveyor12source|task-0] Request finished with exception. Reporting
Connection as Broken (org.apache.plc4x.java.utils.connectionpool2.CachedPlcConnection:203)
java.util.concurrent.CancellationException
 
      at java.base/java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2396)
     
  at org.apache.plc4x.java.utils.connectionpool2.CachedPlcConnection.lambda$wrapReadWithTimeout$1(CachedPlcConnection.java:105)
 
      at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
       
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
 
      at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 
      at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 
      at java.base/java.lang.Thread.run(Thread.java:829)
[2022-10-28 17:20:50,603] INFO [conveyor12source|task-0]
Channel is closed, closing worker Group also (org.apache.plc4x.java.spi.connection.NettyChannelFactory:150)
[2022-10-28
17:20:50,614] INFO [conveyor12source|task-0] Worker Group was closed successfully! (org.apache.plc4x.java.spi.connection.NettyChannelFactory:155)
[2022-10-28
17:20:50,625] WARN [conveyor12source|task-0] Selector.select() returned prematurely 512 times in a row;
rebuilding Selector org.apache.plc4x.java.transport.serial.SerialPollingSelector@1a6dc4ab. (io.netty.channel.nio.NioEventLoop:628)
[2022-10-28
17:20:50,626] INFO [conveyor12source|task-0] Migrated 1 channel(s) to the new Selector. (io.netty.channel.nio.NioEventLoop:499)
[2022-10-28
17:20:50,637] WARN [conveyor12source|task-0] Selector.select() returned prematurely 512 times in a row;
rebuilding Selector org.apache.plc4x.java.transport.serial.SerialPollingSelector@2107b3db. (io.netty.channel.nio.NioEventLoop:628)

The Migrated 1 channel(s) and Selector.select() log entires continues endless.

I've found that the call in NettyChannelFactory to:


@Override    
public void closeEventLoopForChannel(Channel channel) {
        if (eventLoops.containsKey(channel))
{
            logger.info("Channel is closed, closing worker Group also");
            EventLoopGroup
eventExecutors = eventLoops.get(channel);
            eventLoops.remove(channel);
            eventExecutors.shutdownGracefully().awaitUninterruptibly();
 
          logger.info("Worker Group was closed successfully!");
        } else {
            logger.warn("Trying
to remove EventLoop for Channel {} but have none stored", channel);
        }
    } 

eventExecutors.shutdownGracefully() never returns even though the above log indicates it did. I've changed the call to:


eventExecutors.shutdownGracefully().addListener(e->{ LoggerFactory.getLogger(NettyChannelFactory.class).info("closeEventLoopForChannel:
Worker group finally shut down"); } ); 

in an attempt to eliminate any possible deadlocking.

Any suggestions would be much appreciated.

 

Imported from Jira PLC4X-358. Original Jira may contain additional context. Reported by: Kielsholm.

heyoulin commented 1 year ago

try my fork: https://github.com/spnettec/plc4x

chrisdutz commented 1 year ago

Anything in that fork, that you would like to contribute back to the project?

BioComputerSience commented 2 months ago

We have the same issue for Modbus TCP. Is there something new for a fix of the bug ?

We added a timeout: workerGroup.shutdownGracefully().awaitUninterruptibly(2000);

ottlukas commented 2 months ago

Which version of PLC4J are you running on? and if possible can you provide a PCAP ?

BioComputerSience commented 2 months ago

We use PLC4J 0.11.0. Can't provide a PCAP.

markus-franke commented 2 months ago

@chrisdutz Does something speak against adding a timeout for the graceful shutdown of the worker groups as suggested by @BioComputerSience ?

chrisdutz commented 2 months ago

No it doesn't ... feel free to whip up a PR.

BioComputerSience commented 1 month ago

The PR is open. When will it be released?

chrisdutz commented 1 month ago

Can't find a PR ... would you mind posting a link to this?

chrisdutz commented 1 month ago

But regarding timelines ... I no longer have much time on PLC4X (More on that here https://github.com/chrisdutz/blog/blob/main/plc4x/throwing-the-towel.adoc) ... so I guess there won't be a big release any time soon, the way I currently see it unless someone from the community does it.

BioComputerSience commented 1 month ago

https://github.com/apache/plc4x/pull/1701

chrisdutz commented 1 month ago

Well that is already merged ... so it's in ... you just need to build it yourself.

markus-franke commented 1 month ago

@chrisdutz As far as I can see the PR was just closed instead of merged. I don't see the relevant changes on "develop" branch.

chrisdutz commented 1 month ago

You are absolutely right ... I wanted to merge it, not sure why I clicked "Close" ... thanks for pointing that out ... and possibly explains why I claimed to have merged it ;-)

markus-franke commented 1 month ago

@chrisdutz No problem. Thanks for the merge ;-)