connamara / quickfixn

QuickFIX/n implements the FIX protocol on .NET.
http://quickfixn.org
Other
463 stars 552 forks source link

Timing issue loading DLLs #864

Open baffled opened 2 days ago

baffled commented 2 days ago

There is a potential timing issue if you start more than one connection at the same time, each creating a DefaultMessageFactory.

When discovering the QuickFix.FIXxx.dll libraries, LoadLocalDlls() kicks out the second (and any subsequent) thread to prevent it reloading the same assemblies into memory. But if the first thread has not finished iterating through all the libraries and exited the function, the second thread will end up querying a truncated list and won't have all the factories in its list. That leads to an incorrect BeginString error at runtime when parsing message groups (and took a load of tracking down).

Better to lock the DefaultMessageFactory.LoadLocalDlls() function to ensure it completes before the second thread hits the loadFlag test.

gbirchmeier commented 2 days ago

Thank you for reporting this! I see it in the code, and agree with your interpretation of the situation.

The original coder did implement a loadFlag to ensure that two threads wouldn't run this function concurrently... but, in line with your explanation, second/subsequent threads should be forced to wait until the first thread's call completes the task before they are allowed to return.

(Also, that @ variable in there is kind of gross and unnecessary.)

I've seen/heard this incorrect-BeginString error happen capriciously, but not in consistent ways that have made it easy to follow-up on. But I think this could be the cause! I'm very happy you tracked this down 👍 and I will try to get the fix in the next release (which shouldn't be too far off I hope)

baffled commented 2 days ago

Hi Grant

Happy to Help!

Just adding

public static object _syncObj = new object();

and then

private static void LoadLocalDlls() { const int @true = 1;

        // Because we want to attempt load assemblies once only
        // BFL this does not work on its own as there can be a timing issue if a second thread hits it whilst the first
        // is still building the list of DLLs. This means in the next function it only queries against some of the factories.
        // the safer way is to lock the object.

        lock (_syncObj) {
            var loadFlag = Interlocked.Exchange(ref _dllLoadFlag, @true);
            if (loadFlag == @true) {
                return;
            } 
            . . . 

should sort it .. but hey, you know that and can probably come up with something more elegant. I’m up against time.

(and I agree about the @ prefix , maybe get rid 😊)

Unfortunately for each upgrade I have to reinstate a load of shadow methods to handle a non-standards-compliant partner system over which we have no authority, but that’s life.

In case I haven’t said this to you before, really appreciate everything you have done and continue to do on this project.

Brian