amqp / rhea

A reactive messaging library based on the AMQP protocol
Apache License 2.0
277 stars 79 forks source link

Receiver stops processing messages after sometime automatically #230

Open GaikwadPratik opened 5 years ago

GaikwadPratik commented 5 years ago

In our project, we have have few exchanges on which messages are transferred. I created one connection to qpid using rhea-promise. With the same connection I have created two different receivers listening to different exchanges. For some reason, after couple of hours(this time varies for different systems) receiver stops processing messages completely. However if I start a new receiver with exact same options after original receiver stops, new receiver works fine for same amount of time and same thing is repeated. Same thing happens for a topic as well, so I am guessing it is something todo with receiver.

The whole time checked, before and after processing messages, receiver had credit.

Receiver options are as follows:

source: { 
   address: Name of topic or exchange
},
credit_window: 1,
rcv_settle_mode: 1,
autoaccept: false

Is there a limit to number of messages a receiver can process? If so, can that be reset or made infinite?

P.S. This issue is show stopper for us.

amarzavery commented 5 years ago

This should work. Please send the debug logs for us to be able to help you. Take a look at how debug logs can be set over here. Logging to a file will give you timestamps for free and that will provide accurate information on the interval at which the error happens.

grs commented 5 years ago

Is there a reason you are using rcv_settle_mode: 1? What broker are you using? (I don't believe qpid c++ broker which you mentioned using in another issue will handle that option correctly).

GaikwadPratik commented 5 years ago

@amarzavery Okay... let me see if I can collect the logs... since this happens over the course of roughly two hours, the logs file is going to be huge but let me see what I can do..

amarzavery commented 5 years ago

The part that would be interesting is what happens in the last few (~ 4-5 minutes) before the error happens. This would give us an idea of the sequence of events that are happening which causes the receiver link to die.

GaikwadPratik commented 5 years ago

@grs ,

We do use qpid c++ broker. Earlier we used node10-amqp package. While utilizing that the option was settlementMode: receiverSettleMode.settleOnDisposition option was used. From the docs in rhea-promise, I think this was resulted in 1. If this won’t work correctly, what option should be used instead?

grs commented 5 years ago

rcv_settle_mode: 1 is receiver settles second, i.e. the receiver accepts the message but does not settle it, then the sender, i.e the broker settles it, then on notification that the sender has settled the receiver settles.

settleOnDisposition is not an AMQP 1.0 protocol option, so I'm not sure, but I would guess from the name that it just means that when you accept it is settled as well.

The default rcv_settle_mode is 0 i.e. receiver settles first, and I would advise using that unless you have a reason to use the other value (and your broker supports it properly).

GaikwadPratik commented 5 years ago

@grs,

Good to know. I will make that change. Do you think, this could be one of the reason for original issue?

@amarzavery ,

Is there a way to collect log messages from only one receiver or exchange?

GaikwadPratik commented 5 years ago

@grs ,

After changing rcv_settle_mode to 0, the issue seems to have resolve. Have been running test case for 24 hours now and no issues so far.

Closing this issue for now.

@amarzavery ,

If I get this issue again, I will collect the logs again. :)

Thank you both for quick response.

GaikwadPratik commented 5 years ago

@grs, @amarzavery ,

Is there a way to collect the logs from a particular queue, exchange or sender and receiver nodes? Because we are getting this issue in Production only for a particular queue. Restarting the consumer service which restarts(recreates) sender and receiver on that queue seems to work.

grs commented 5 years ago

No, it is connection level only. My guess is that it is a different issue from your original report (which was due to the rcv_settle_mode). One suggestion would be to try and isolate the difference between the queue on which you have a problem and others and try to get a reproducer of some kind.