aaOpenSource / aaLog

A library and example programs for reading the standard SMC log file format
MIT License
13 stars 15 forks source link

in aaLogWebAPI, the aalogDataSource.Instance singleton is not thread safe #26

Open brog45 opened 8 years ago

brog45 commented 8 years ago

Between null test and initialization of backing field there is a race condition where a second thread can test null and proceeding separately to initialize the backing field. The second thread would create a a second instance, which would replace the first. These two threads would then each have a separate instance of aaDataSource.

brog45 commented 8 years ago

As we previously dicussed on Twitter, I will happily submit a pull request for this myself as a learning experience. I did some research to double-check the approach I was considering, but it turned out not to be as strong as I had originally thought. I'm thinking of using the Fifth version - fully lazy instantiation described in the article, Implementing the Singleton Pattern in C#.

brog45 commented 8 years ago

I think the advice these days is to avoid writing singleton classes and let the dependency injection container manage your singletons (or whatever lifecycle) for you. That said, I browsed your code and it's so elegant with the singleton. :-)

ryanvs commented 8 years ago

There is a bigger concern than the aalogDataSource singleton. Actually you can fix the singleton (the 6th version is definitely my preference these days when you can't use dependency injection), and there are other multi-threaded and multi-user issues.

The biggest issue is that the aaLogReader maintains all of the state information in the #region Globals section. So the FileStream, LogHeader, and all other module level variables are actually current state information for the last user or method call. I started to create multi-thread safe versions of some of the methods before realizing the multi-user concerns were there.

If multiple users are connected to aaLogWebAPI, when one user calls GetUnreadRecords it automatically changes the state information and affects the other user, specifically the LastRecordRead and possibly CurrentLogHeader if the logger moves to a new log file.

My original thought was to return the current LogRecord (last record read) along with the current LogFile information. But then we would need to add methods to check if the it LogFile has changed and then open a new FileStream. It starts to get messy.

I do not see an easy solution, other than storing the aaLogWebApi/aaLogReader state information for each connected user. aaLogReader seems fast enough that reopening the FileStream may not result in a big performance penalty.

The other option would be to create an aaLogPathManager class that is a singleton and reads the available log files and maintains the log file information such as first and last record and possibly associated time stamps. This would be to avoid repeated calls to aaLogReader.IndexLogHeaders. After that we would need to write a "stateless" and "stateful" version of aaLogReader. The "stateful" version would be much simpler to use for basic usage, and it would basically be a subclass of the stateless version with a built-in state. But if there are multi-threaded and multi-user concerns, then you need to use the stateless version.

arobinsongit commented 8 years ago

Agree on all major points. The original library didn't really have the concept of multi-user.

I also agree that the indexing and file streams are so fast I'm not sure it really matters. So, with that in mind maybe we just do away with the singleton completely and create a new instance with each call?

Another idea would be to use the GetRecordsByMessageNumberAndCount call. The client would track the last message number they received and when they call back they would pass this message number with the appropriate direction (forward) and count (if they want to page). Once the count retrurned is less than the count they asked for they know they've got it all.

I too have concerns that simultaneous requests will interfere with each other because of the common state maintained in the object at the global level.

I like the idea of the aaLogPathManager class. Keeping a persistent index in the background seems like a reasonable thing to do.. only concern is if this is premature optimization and adding unnecessary complexity? Best way to test would be to try to load up a huge log set, 10's of millions of records with 10's of thousands of log files and see what kind of performance we get indexing the headers.

Thoughts?