connamara / quickfixn

QuickFIX/n implements the FIX protocol on .NET.
http://quickfixn.org
Other
469 stars 557 forks source link

Exception in socket initiator thread crashing the whole application #257

Closed joseph-orbis closed 4 years ago

joseph-orbis commented 10 years ago

In our test environment we've encountered two occasions where an exception in the socket initiator thread crashed the whole application. Below are the two different exceptions causing the crashes.

2014-03-04 03:12:11,255 [20] ERROR Orbis.Trading.Common.Services.BaseService System.ObjectDisposedException: Cannot access a disposed object. Object name: 'FileLog'. at QuickFix.FileLog.DisposedCheck() at QuickFix.FileLog.OnEvent(String s) at QuickFix.Transport.SocketInitiator.SocketInitiatorThreadStart(Object socketInitiatorThread) at System.Threading.ThreadHelper.ThreadStart_Context(Object state) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Threading.ThreadHelper.ThreadStart(Object obj

System.IO.IOException: Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host at System.Net.Sockets.Socket.Send(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags) at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size) --- End of inner exception stack trace --- at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size) at QuickFix.SocketInitiatorThread.Send(String data) at QuickFix.Session.Send(String message) at QuickFix.Session.SendRaw(Message message, Int32 seqNum) at QuickFix.Session.GenerateTestRequest(String id) at QuickFix.Session.Next() at QuickFix.SocketInitiatorThread.Read()

joseph-orbis commented 10 years ago

For the first exception from FileLog.DisposedCheck(), I've looked at the code and SocketInitiator.SocketInitiatorThreadStart() method only catches IOException and SocketException. Therefore ObjectDisposedException slipped through and crashed the application

I'm wondering if it has to do with session objects being re-used after disposal, because they don't seem get removed from the sessions_ dictionary.

joseph-orbis commented 10 years ago

I think I may have a clue to why this happened.

In AbstractInitiator.Connect(), it calls Session.LookupSession() which will fetch the session object from the static Session.sessions_ dictionary. But if the session object was previously disposed by AbstractInitiator.Stop() (either the same or different instance of initiator), any further usage of it will trigger the ObjectDisposedException from the underlying FileLog.

gbirchmeier commented 10 years ago

Yes, that sounds plausible. (The exception is definitely caused by some call to an already-disposed object.)

joseph-orbis commented 10 years ago

Our application creates a new initiator instance each time we reconnect due e.g. due to network outage. Maybe the session disposal from the previous initiator thread clashed with the start up of new initiator instance.

gbirchmeier commented 10 years ago

A bit off-topic, but, why do you do that? If the network drops, the initiator will simply keep trying to connect until the network is back. There's no reason to create a new initiator, unless I'm missing something.

joseph-orbis commented 10 years ago

We wanted more control over the behaviors under those situations (e.g. not allowing new trades while the connection is 'in-doubt', so we don't want QF/n to appear as alive and keep queueing up FIX msgs in the outbox while it's trying to reconnect).

Instead of creating a new initiator after stopping, we can consider re-using the old initiator instance. Is it designed to support multiple start/stop cycles?

joseph-orbis commented 10 years ago

I think I found out exactly where and how it happened and it's unrelated to instantiating new initiator instance.

Just point the TradeClient example to a non-existent port, use a FileLogFactory instead of ScreenLogFactory, and add Thread.Sleep(5000); initiator.Stop(); after initiator.Start(). After a while ObjectDisposedException gets thrown from inside ScketInitiator.cs:Line 77, when it tries to log the exception (from failing to connect) but the session (and the FileLog instance) is already disposed.

joseph-orbis commented 10 years ago

Essentially what this means is that if you call initiator.Stop() before the connection attempt times out (default is about 20s) , the logging of such exception will throw ObjectDisposedException if you use file logging. This doesn't happen with ScreenLog.

The only situation where this can be avoided is if initiator.Stop() is called after the connection attempt failed (e.g. 25s after initiator.Start()) but before the next reconnect interval.

joseph-orbis commented 10 years ago

I have a few suggestions on how to address this:

  1. Not throwing in FileLog. Since it is only for logging purpose I don't think it's necessary to throw any exceptions and require everything that uses it to handle them. Worst case here is just that we miss some log entries. Since logger can potentially be used by so many places I think it's too much burden/risk for little benefits.
  2. Synchronizing initiator Start/Stop. What this issue boils down to is a concurrency issue where Start and Stop are conflicting each other through an unprotected resource (FileLog). Maybe there should be a lock to defer the stop / disposal until SocketInitiatorThreadStart has finished (either successfully or not).
  3. Try catch all three calls to Session.Log in ThreadInitiatorThreadStart. Since this is the last line of defence against any exceptions from the thread that can then crash the main process we should be catching any known (or even unknown) exceptions even inside the catch{} block.
roken commented 10 years ago

I've also stopped using the FileLog due to this issue, and came to the same conclusion you mentioned above -- the FileLog is disposed even though the session is actually still "active" (as in, it is still attempting to do stuff and log results).

AtanuSinha commented 9 years ago

Any solution if you still wanna continue with FileLog

Here's my code snippet:

Thread v = new Thread(() => SendMsg(sessionID)); try { v.Start(); } catch(System.ObjectDisposedException oEx) { //log this... }

public void SendMsg(SessionID sessionID) { using((.......initiate initiator here ....)) { try { initiator.start();

//Do stuff here.............................

catch() {} finally { initiator.stop()} } } }

it still crashes my main application. How can I avoid crashing the main application?

andreimarculescu commented 9 years ago

For those looking for a workaround, here is what I developed on my project:

ILogFactory implementation:

using QuickFix;

namespace PerseusLib.FixApi
{
    class SafeFileLogFactory : ILogFactory
    {
        private readonly FileLogFactory _quickFixFactory;

        public SafeFileLogFactory(SessionSettings settings)
        {
            _quickFixFactory = new FileLogFactory(settings);
        }

        public ILog Create(SessionID sessionId)
        {
            return new SafeFileLog(_quickFixFactory, sessionId);
        }
    }
}

ILog implementation

using System;
using QuickFix;

namespace PerseusLib.FixApi
{
    internal sealed class SafeFileLog : ILog
    {
        private readonly ILogFactory _quickFixFactory;
        private readonly SessionID _sessionId;
        private static volatile ILog _quickFixFileLog;
        private static volatile bool _shouldCreateFileLog = true;
        private static readonly object _creationLock = new object();

        public SafeFileLog(ILogFactory quickFixFactory, SessionID sessionId)
        {
            _quickFixFactory = quickFixFactory;
            _sessionId = sessionId;
        }

        public void Dispose()
        {
            if (!_shouldCreateFileLog)
            {
                lock (_creationLock)
                {
                    if (!_shouldCreateFileLog)
                    {
                        _shouldCreateFileLog = true;
                        _quickFixFileLog.Dispose();
                    }
                }
            }
        }

        public void Clear()
        {
            Action logAction = () => _quickFixFileLog.Clear();
            SafeLogAction(logAction);
        }

        public void OnIncoming(string msg)
        {
            Action logAction = () => _quickFixFileLog.OnIncoming(msg);
            SafeLogAction(logAction);
        }

        public void OnOutgoing(string msg)
        {
            Action logAction = () => _quickFixFileLog.OnOutgoing(msg);
            SafeLogAction(logAction);
        }

        public void OnEvent(string s)
        {
            Action logAction = () => _quickFixFileLog.OnEvent(s);
            SafeLogAction(logAction);
        }

        private void EnsureNotDisposed()
        {
            if (_shouldCreateFileLog)
            {
                lock (_creationLock)
                {
                    if (_shouldCreateFileLog)
                    {
                        RecreateUnderlyingLog();
                    }
                }
            }
        }

        private void RecreateUnderlyingLog()
        {
            _quickFixFileLog = _quickFixFactory.Create(_sessionId);
            _shouldCreateFileLog = false;
        }

        private void SafeLogAction(Action logAction)
        {
            EnsureNotDisposed();

            try
            {
                logAction();
            }
            catch (ObjectDisposedException)
            {
                // It might occur if Dispose is called avter the call to EnsureNotDisposed
                lock (_creationLock)
                {
                    RecreateUnderlyingLog();

                    logAction();
                }
            }
        }
    }
}
francis2 commented 9 years ago

I agree about not throwing in FileLog. This seemingly insignificant change was introduced after the 1.5 release and it is not how quickfix.net worked so it caused an unhandled error in 7+ year old code.

tr8dr commented 9 years ago

I too am having this issue. When I lose connection with an exchange am recreating the SocketInitiator and getting this Dispose error. What is the preferred method to reinitiate the connection?

francis2 commented 9 years ago

QF automatically reconnects after a connection loss. You do not need to recreate the SocketInitiator.

Whenever I force a disconnect, I call Disconnect(), then Stop() (be sure to trap exceptions), then GC.Collect() and GC.WaitForPendingFinalizers() (these last two were required with QF/C. I am not sure they're necessary with QF/net but I left them there as they don't hurt).

gbirchmeier commented 9 years ago

@hadagali - Github is not for help requests, and furthermore, what you've posted here is off-topic for this particular issue. I'm deleting your comments. I ask that you look at the support options listed here: http://quickfixn.org/help.html

gbirchmeier commented 4 years ago

addressed by #505