connamara / quickfixn

QuickFIX/n implements the FIX protocol on .NET.
http://quickfixn.org
Other
469 stars 557 forks source link

Heartbeat not sent during constant traffic #57

Closed mjwood7 closed 12 years ago

mjwood7 commented 12 years ago

I believe I've found a bug:

As an initiator, if you receive constant traffic for an amount of time greater than your heartbeat interval, you'll skip the heartbeat and therefore will/should be disconnected by the remote host.

I think the logic responsible is found in SocketInitiatorThread.Read, within the nested if/then statements.

mjwood7 commented 12 years ago

Suggestion - Perhaps un-nesting the if/else block.

public bool Read()
        {
            try
            {
                if (socket_.Poll(1000000, SelectMode.SelectRead)) // one-second timeout
                {
                    int bytesRead = socket_.Receive(readBuffer_);
                    if (0 == bytesRead)
                        throw new SocketException(System.Convert.ToInt32(SocketError.ConnectionReset));
                    parser_.AddToStream(System.Text.Encoding.UTF8.GetString(readBuffer_, 0, bytesRead));
                }
                **else** if (null != session_)
                {
                    session_.Next();
                }
                else
                {
                    throw new QuickFIXException("Initiator timed out while reading socket");
                }

                ProcessStream();
                return true;
            } 
           catch ...
}
cbusbey commented 12 years ago

I don't think this is necessarily a bug. The fix protocol handles both sides of the heartbeat transaction. Take a look at this:

http://fixwiki.fixprotocol.org/fixwiki/Heartbeat

Specifically the part about when a side has not received a heartbeat in the heartbeat interval.

When either end of the connection has not received any data for (HeartBtInt + "some reasonable transmission time") seconds, it will transmit a Test Request message. If there is still no Heartbeat message received after (HeartBtInt + "some reasonable transmission time") seconds then the connection should be considered lost and corrective action be initiated.

In the scenario you gave, the counter party sending the constant traffic should notice that no heartbeat has been sent and send a test message. Assuming the test message is properly received and responded to (which it will, see Session.Next and Session.NextTestRequest), all should be good. It won't kill the connection until after the test message is sent + heartbt int + "some reasonable transmission time"

mjwood7 commented 12 years ago

thanks and good point. I'll bring this up with party we're connecting to.

mjwood7 commented 12 years ago

I have also included the distribution for any commentary on this:

I did some further reading, specifically from the sections you quoted. Right before the section you included was this line:

"When either end of a FIX connection has not sent any data for [HeartBtInt] seconds, it will transmit a Heartbeat message. "

source: http://fixprotocol.org/documents/346/fix-44_w_Errata_20030618_PDF.zip Volume2, Page 16, Section “Administrative Messages”:

So with the current code, my scenario where I receive constant traffic (perhaps a HeartBtInt+ second long stream of IOI messages) causes me skip a heartbeat. You are right that other end should send a test message, but in the same sense, I should be sending a heartbeat.

any thoughts?

Thanks,

-Matt

On Tue, Apr 24, 2012 at 10:42 AM, Chris Busbey reply@reply.github.com wrote:

I don't think this is necessarily a bug.  The fix protocol handles both sides of the heartbeat transaction.  Take a look at this:

http://fixwiki.fixprotocol.org/fixwiki/Heartbeat

Specifically the part about when a side has not received a heartbeat in the heartbeat interval.

When either end of the connection has not received any data for (HeartBtInt + "some reasonable transmission time") seconds, it will transmit a Test Request message. If there is still no Heartbeat message received after (HeartBtInt + "some reasonable transmission time") seconds then the connection should be considered lost and corrective action be initiated.

In the scenario you gave, the counter party sending the constant traffic should notice that no heartbeat has been sent and send a test message.  Assuming the test message is properly received and responded to (which it will, see Session.Next and Session.NextTestRequest), all should be good.  It won't kill the connection until after the test message is sent + heartbt int + "some reasonable transmission time"


Reply to this email directly or view it on GitHub: https://github.com/connamara/quickfixn/issues/57#issuecomment-5305992

mjwood7 commented 12 years ago

Matt, IMHO, I don't think the receiving app in this case should send a heartbeat. Obviously, it is necessary for the sending app to send the TestRequest and the receiving app to respond with a heartbeat. I have seen the situation where the receiving app is a slow consumer and fails to process the TestRequest and send the heartbeat in time to prevent the sending app from disconnecting.

Jim

On Fri, Apr 27, 2012 at 8:14 AM, Matt Wood mjwood7@gmail.com wrote:

I have also included the distribution for any commentary on this:

I did some further reading, specifically from the sections you quoted. Right before the section you included was this line:

"When either end of a FIX connection has not sent any data for [HeartBtInt] seconds, it will transmit a Heartbeat message. "

source: http://fixprotocol.org/documents/346/fix-44_w_Errata_20030618_PDF.zip Volume2, Page 16, Section “Administrative Messages”:

So with the current code, my scenario where I receive constant traffic (perhaps a HeartBtInt+ second long stream of IOI messages) causes me skip a heartbeat. You are right that other end should send a test message, but in the same sense, I should be sending a heartbeat.

any thoughts?

Thanks,

-Matt

On Tue, Apr 24, 2012 at 10:42 AM, Chris Busbey < reply@reply.github.com

wrote: I don't think this is necessarily a bug. The fix protocol handles both sides of the heartbeat transaction. Take a look at this:

http://fixwiki.fixprotocol.org/fixwiki/Heartbeat

Specifically the part about when a side has not received a heartbeat in the heartbeat interval.

When either end of the connection has not received any data for (HeartBtInt + "some reasonable transmission time") seconds, it will transmit a Test Request message. If there is still no Heartbeat message received after (HeartBtInt + "some reasonable transmission time") seconds then the connection should be considered lost and corrective action be initiated.

In the scenario you gave, the counter party sending the constant traffic should notice that no heartbeat has been sent and send a test message. Assuming the test message is properly received and responded to (which it will, see Session.Next and Session.NextTestRequest), all should be good. It won't kill the connection until after the test message is sent + heartbt int + "some reasonable transmission time"


Reply to this email directly or view it on GitHub: https://github.com/connamara/quickfixn/issues/57#issuecomment-5305992


Quickfixn mailing list Quickfixn@lists.quickfixn.com http://lists.quickfixn.com/listinfo.cgi/quickfixn-quickfixn.com

Connamara Systems, LLC Made-To-Measure Trading Solutions. Exactly what you need. No more. No less. http://www.connamara.com http://connamara.com/

cbusbey commented 12 years ago

That last comment was from Connamara CEO @jcdowns. I think his identity theft on @mjwood7 was accidental...

cbusbey commented 12 years ago

Looks there can be some improvements to the Session class when it finishes processing a message. Specifically, should make heart beat check after processing message. I'll get a pull request in asap.

cbusbey commented 12 years ago

Hey Matt, I've got what I think is a fix in my fork on branch issue_57_fix. It's actually a one-liner, https://github.com/cbusbey/quickfixn/compare/issue_57_fix. I need to whip up some unit tests before submitting a pull request, but if you have the resources to test the fix, by all means give it a shot.

mjwood7 commented 12 years ago

Chris, I probably won't be able to test it until Monday, but I'll let you know how it goes.

-Matt

On Fri, Apr 27, 2012 at 3:28 PM, Chris Busbey reply@reply.github.com wrote:

Hey Matt, I've got what I think is a fix in my fork on branch issue_57_fix.  It's actually a one-liner, https://github.com/cbusbey/quickfixn/compare/issue_57_fix.  I need to whip up some unit tests before submitting a pull request, but if you have the resources to test the fix, by all means give it a shot.


Reply to this email directly or view it on GitHub: https://github.com/connamara/quickfixn/issues/57#issuecomment-5388619

gbirchmeier commented 12 years ago

Hi Matt, let us know when you've had a chance. Will merge as soon as you confirm.

mjwood7 commented 12 years ago

I should know by tomorrow afternoon. I'll keep you posted.

mjwood7 commented 12 years ago

So far this looks good. I'm confident this is fixed however I'm going to continue monitoring till after the end of the production day, when we receiving a sharp increase in volume (even more so than at the start of the day).

cbusbey commented 12 years ago

Wunderbar. Thanks again for following up with the issue, Matt.

mjwood7 commented 12 years ago

yeah seems to be fixed.