fusesource / stompjms

The JMS interface to STOMP
Other
46 stars 27 forks source link

Apache ActiveMQ Artemis: Peer disconnected #26

Open themerius opened 9 years ago

themerius commented 9 years ago

Hi there, I've tested this library with Apache ActiveMQ Artemis 1.1.0, and it works. But if my process idles some time, it gets this exception:

background log: error: java.io.EOFException: Peer disconnected
background log: error:  at org.fusesource.hawtdispatch.transport.AbstractProtocolCodec.read(AbstractProtocolCodec.java:331)
background log: error:  at org.fusesource.hawtdispatch.transport.TcpTransport.drainInbound(TcpTransport.java:706)
background log: error:  at org.fusesource.hawtdispatch.transport.TcpTransport$6.run(TcpTransport.java:588)
background log: error:  at org.fusesource.hawtdispatch.internal.NioDispatchSource$3.run(NioDispatchSource.java:209)
background log: error:  at org.fusesource.hawtdispatch.internal.SerialDispatchQueue.run(SerialDispatchQueue.java:100)
background log: error:  at org.fusesource.hawtdispatch.internal.pool.SimpleThread.run(SimpleThread.java:77)

Maybe something makes a timeout (heart beat to slow?), which causes the server to disconnect this client? If I'm using factory.setDisconnectTimeout(...) it has no effect, but maybe I'm searching at the wrong place?

For comparison: On Apache Apollo the connections remains open.

clebertsuconic commented 9 years ago

can you provide a simple test replicating this?

clebertsuconic commented 9 years ago

duh.. this is Java.. so simple to check... we should open a JIRA on artemis.

@themerius want to make the honors or should I open it?

clebertsuconic commented 9 years ago

https://issues.apache.org/jira/browse/ARTEMIS-239

clebertsuconic commented 9 years ago

close this one.. I will take a look through the JIRA. I will fix it on Artemis. will get back here if I see any issues. thanks

themerius commented 9 years ago

Thanks for your fast reply! I would happy to support you with testing the fixed Artemis.

clebertsuconic commented 9 years ago

@themerius : How are you using this? I couldn't make the StompJMS to send any KEEP-alives (as I used debugging to verify this).. I may be wrong.. but I couldn't see anything.

I already see a few things wrong that need improvement on the Stomp manager, but I'm a bit confused on making the actual keep alive frames to be sent.

I added an example on master using stomp-jms, that maybe you could tweak to replicate the issue you are seeing:

https://github.com/apache/activemq-artemis/tree/master/examples/protocols/stomp/stomp-jms

Can you help on that? Otherwise I won't know how to replicate your issue.

themerius commented 9 years ago

@clebertsuconic : I've hacked a little bit on your example. So I've added a infinite loop to wait on messages and the possibility to send messages all second.

Have a look at my branch: https://github.com/themerius/activemq-artemis/tree/master/examples/protocols/stomp/stomp-jms

Run with mvn verify and you get after roundabout 60 seconds the Peer disconnected exception. (It waits for messages, but no messages are currently arriving)

Run with man verify -Dtraffic=true to produce some message traffic (all second), and where will be no exception.

jscheid commented 9 years ago

@clebertsuconic @themerius I don't know StompJMS at all, but as far as I can tell this is working as expected on the Artemis side of things.

Artemis will close the connection if no data has been received for a certain amount of time. By default it will check every 30 seconds, and will evict connections that haven't received data in 30 seconds. So depending on when exactly you send data, connections are closed anytime between 30s-60s after data was last received.

It looks like StompJMS doesn't have any support for heart-beating (see #17) so if you're not sending anything yourself then the connection will be closed after a while.

If you want Artemis to behave more like Apollo, you can increase the timeouts. (I don't know what timeouts Apollo uses by default, but evidently they are higher.)

Alternatively, somebody could implement proper heart-beating in StompJMS or you could add poor man's heart-beating to your application code (i.e. manually sending a dummy message in regular intervals).

themerius commented 9 years ago

@jscheid Thanks for your reply. I've feared something like that.

Is it possible to configure a acceptor in such a way that the connection TTL can be set to infinite? I've tried something like this after reading this:

<acceptor name="stomp">tcp://0.0.0.0:61613?protocols=STOMP;stompEnableMessageId=true;connectionTtl=-1</acceptor>

But it seems that this is not the right way to increase the timeouts to infinite?

But indeed, a neat solution would be to have heart beats.

jscheid commented 9 years ago

It looks right to me, does it not work? @clebertsuconic knows more about Artemis configuration, perhaps he can chime in.

themerius commented 9 years ago

I'll get still Peer disconnected after about 60 seconds. Whatever I choose, -1, 10000, 999999, it still disconnects after 60 seconds. (I've used the test code from https://github.com/themerius/activemq-artemis/tree/master/examples/protocols/stomp/stomp-jms)

clebertsuconic commented 9 years ago

There is the master configuration ttlOverride.. not on the acceptor I'm afraid. (although it makes sense and it looks an easy change)

What happens is per definition stomp should be -1 if no TTL or ping sent (per stomp docs/spec)

and the connection should be closed through netty failures, what should happen after TCP settings.

Also, @jscheid the TTL Checker is using a Thread instead of the scheduled executors.. what won't scale to many connections... it's one thing that's need to be changed.

jscheid commented 9 years ago

@clebertsuconic where does it say that in the STOMP spec?

clebertsuconic commented 9 years ago

@jscheid I don't know ... @chirino told me :)

jscheid commented 9 years ago

@themerius after discussing with @clebertsuconic on IRC, it turns out that Artemis has two separate mechanisms for terminating an idle STOMP connection:

It seems to me that the correct fix is to disable the former for the STOMP protocol. Then, without a heart-beat header, you should get infinite connection life.

themerius commented 9 years ago

@jscheid Because this library is STOMP 1.0 and sends no heart-beat header, so Artemis should make a infinite connection life? Or must this first fixed?

jscheid commented 9 years ago

@themerius should work once https://github.com/apache/activemq-artemis/pull/208 is merged.

clebertsuconic commented 9 years ago

artemis should make an infinite connecotin life accordingly to Hiram

clebertsuconic commented 9 years ago

... and his spec