fusesource / mqtt-client

A Java MQTT Client
http://mqtt-client.fusesource.org/
Apache License 2.0
1.27k stars 368 forks source link

dead lock when processing fetching/sending messages at high frequency #21

Open ralfkornberger opened 11 years ago

ralfkornberger commented 11 years ago

I'm using Apache Camel with MQTT to fetch data from a Mosquitto broker. Data are published there at high frequency (< 10s) by serveral devices. After receiving the data, I send an acknowlege message back. This is done by publishing a message to a topic for each device. I'm using the Fusesource MQTT Client (version 1.5) for this. I encountered the following problem: after some time (can be 15 minutes up to 1 day) some thing "weird" happens. The application stops receiving or sending any data via MQTT. Looking at it with jstack reveals the following:

"hawtdispatch-DEFAULT-2" daemon prio=10 tid=0x00007facc1a2f000 nid=0x782d waiting on condition [0x00007fac42bcf000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

Appearently, both the Camel receiving thread and the Fusesource client thread are hanging at at org.fusesource.mqtt.client.Promise.await(Promise.java:88)

Since I use BlockingConnection in my sending client, I took a look at the Fusesource MQTT client. In BlockingConnection.java, function public void publish(final UTF8Buffer topic, final Buffer payload, final QoS qos, final boolean retain) throws Exception

in line 80, a Future is received on publishing. And there is an await() afterwards. When I change this await() to await(30L, TimeUnit.SECONDS), the problem still occurs, but the application keeps working. I've put in debug printouts at the trace class which show me that at the time the problem occurs the MQTT client seems to loose the connection to the broker and tries to reestablish it. Debug logs also show that the timeout exception throw by the timeouted await comes every minute for ca. 20 minutes. Then the problem "vanishes" and comes again after serval hours.

Ps.: I also posted this at the Apache Camel Jira : https://issues.apache.org/jira/browse/CAMEL-6717

ralfkornberger commented 11 years ago

I edited the post to make it more clear as I have futher investigated the problem

rajdavies commented 11 years ago

could you try camel 2.12? - its upgraded its dependency on the mqtt client from 1.4 to 1.5

ralfkornberger commented 11 years ago

Yes, I'll try. BUT: I'm arlready using 1.5. I've put the source code into my projects source tree for debugging. I've put debug printouts into the tracer class. I see debug messages when no client sends data (e.g the MQTT pings). So I assume that Camel uses my 1.5 debug version. That lets me conclude that the issue is still present in 1.5. BTW: is there a difference in thread handling between 1.4 and 1.5?

rajdavies commented 11 years ago

OK - I suspect the real problem is the handling of the disconnects while awaiting a promise - if the reconnect is successful - the ACK should be resent

ralfkornberger commented 11 years ago

May be, I can't say 100%. When I use await with timeout, the application reconnects. If the message gets resend, I can't say because I have too many messages to be able to debug manually. But in general: is this a bug? And is using await (timeout) the right solution? I don't know what happens to the message when the timeout is reach. A timeout exception is thrown then. I catch it but do nothing but a simple "return". The application keeps working then.

rajdavies commented 11 years ago

Yes - this is a bug - would you be able to raise an issue for it ?

ralfkornberger commented 11 years ago

Yes, you mean like "problem with disconnect while awting a promise"?

rajdavies commented 11 years ago

exactly :)

ralfkornberger commented 11 years ago

ok