Closed VyPu closed 7 years ago
Locked
Yes, both of these checks the LockCount and raises an exception if the LockCount
is > 0. Imagine, that one thread releases the critical section and then goes to the next line, where it executes the Dec(ocsLockCount)
. But just before the Dec(ocsLockCount)
another thread might call GetValue
. It will fail, because the ocsLockCount
is still 1.
Sorry, I wrote too quick comment. I made a mistake, the assert will raise if the lockCount is 0. I will rethink my example soon.
Only one thread can access Value at the time (because only one thread can acquire the critical section). When a critical section is acquired, lock count will be >0. Something else must be going on.
Ok, it seems, that it is more difficult to prove it. Somehow in my head I imagined, that the assert fails if the LockCount is zero, but it is the opposite.
For some reason I get the assertion failure, but it happens very randomly and very rarely. Also, not on every machine. I checked all my calls to the Locked struture, and all off them are very simple, encapsulated by try .. finally blocks with Acquire and Release.
Switching those two lines magically solved the problem. Now the question is - how? I feel, that the problem is with the LockCount. The first impression was, that if the Aquire first acquires the critical section and then changes the LockCount, then Release should do the opposite.
Is it possible, that ocsLockCount is not aligned and read and write operations are not atomic?
This swap fixed your problem because you have a race condition somewhere in your code. Two threads are doing something at the same time and result depends on small details in code. But switching those two lines is not a solution - race condition will still exist but will show up more rarely.
I see... Thanks for the help, and sorry for blaming your code :) I will investigate this further, although I really lack imagination how my code could fail if I call Locked only in two places, both surrounded with Acquire and Release. These calls are used in lots of threads simultaneusly, so it is easy to make a mistake, but still they are very simple calls. Anyway, I will try to isolate the problem somehow.
I managed to isolate a very simple example, which illustrates the problem.
I used CodeSite in order to display error messages, when executed from .exe file, but it can be removed if you don't use it, and the problem will still remain.
Steps to reproduce:
Assertion failure will be raised. It strongly depends on the machine. On my laptop the exception is raised, but not frequently. On another computer i was not able to raise the exception, while on the third one it raises after a few seconds. I hope you will be able to see it.
Switching the lines helps. I was not able to raise the exception on any computer anymore.
unit Unit1;
interface
uses
Winapi.Windows, Winapi.Messages, System.SysUtils, System.Variants, System.Classes, Vcl.Graphics,
Vcl.Controls, Vcl.Forms, Vcl.Dialogs, OtlSync, OtlTaskControl, OtlEventMonitor, Vcl.StdCtrls, CodeSiteLogging;
type
TForm1 = class(TForm)
Button1: TButton;
Button2: TButton;
Button3: TButton;
procedure Button1Click(Sender: TObject);
procedure Button2Click(Sender: TObject);
procedure Button3Click(Sender: TObject);
private
fLocked: Locked<Cardinal>;
fOmniEventMonitor: TOmniEventMonitor;
end;
var
Form1: TForm1;
implementation
{$R *.dfm}
procedure TForm1.Button1Click(Sender: TObject);
begin
fOmniEventMonitor := TOmniEventMonitor.Create(nil);
end;
procedure TForm1.Button2Click(Sender: TObject);
begin
fOmniEventMonitor.Monitor(CreateTask(TOmniWorker.Create as IOmniWorker)).Run.Invoke(
procedure
begin
repeat
fLocked.Acquire;
try
try
fLocked.Value := fLocked.Value + 1;
except
on E: Exception do
CodeSite.Send(E.ToString);
end;
finally
fLocked.Release;
end;
Sleep(1);
until False;
end
);
end;
procedure TForm1.Button3Click(Sender: TObject);
begin
fOmniEventMonitor.Monitor(CreateTask(TOmniWorker.Create as IOmniWorker)).Run.Invoke(
procedure
begin
repeat
fLocked.Acquire;
try
try
CodeSite.Send('Value = %d', [fLocked.Value]);
except
on E: Exception do
CodeSite.Send(E.ToString);
end;
finally
fLocked.Release;
end;
Sleep(1000);
until False;
end
);
end;
end.
Some update: I executed this example application on 5 computers. Sleep(1) is not neccessary, as well as Button3Click. It is very easy to reproduce the issue on one computer only (which is the newest), and in total I have seen it on two computers. Sometimes on that computer I don't get the exception, but if I restart the application a few times, I can easily get it. So, it seems, that depending on the system it might be very hard to reproduce the error.
I agree lock count must be decremented while the critical section is still lock; that is, before Release.
Inc/Dec on a shared variable is not threadsafe unless one uses InterlockedIncrement/InterlockedDecrement.
Oh, and VERY BIG THANKS for finding the problem and minimizing the test case!
I am very glad to contribute to this excellent project! :)
By the way, one thing still wories me. The ocsLockCount variable can in theory still be accessed outside the critical section in TOmniCriticalSection.GetLockCount
function. Thus, if one thread is writing to it via Inc/Dec functions, another thread might check the LockCount and get unpredictable results. I might be wrong, but in my opinion it should also be surrounded by the critical section, or indeed the InterlockedIncrement/InterlockedDecrement should be used instead. For the Locked
procedure TOmniCriticalSection.Acquire;
begin
ocsCritSect.Acquire;
Inc(ocsLockCount);
end; { TOmniCriticalSection.Acquire }
function TOmniCriticalSection.GetLockCount: integer;
begin
Result := ocsLockCount; // <--- Is it really ok?
end; { TOmniCriticalSection.GetLockCount }
procedure TOmniCriticalSection.Release;
begin
Dec(ocsLockCount);
ocsCritSect.Release;
end; { TOmniCriticalSection.Release }
Re: [gabr42/OmniThreadLibrary] Fixed: TOmniCriticalSection.Release decremented ocsLockCount after re… (#108)
Access to LockCount was never designed to be thread-safe. It should really be done only when a CS is locked.
I do agree that exposing such API can lead to problems. I don't, however, see a good solution for that. I don't want it to slow down the execution too much (by doing another trip through acquire/release).
Maybe I just have to document it better - in the code and in the book. I noticed that it is not mentioned at all in the short text on TOmniCS: http://otl.17slon.com/book/chap06.html#synch-criticalsections-tomnics. I have to fix that.
Primož
I am very glad to contribute to this excellent project! :) By the way, one thing still wories me. The ocsLockCount variable can in theory still be accessed outside the critical section in TOmniCriticalSection.GetLockCount function. Thus, if one thread is writing to it via Inc/Dec functions, another thread might check the LockCount and get unpredictable results. I might be wrong, but in my opinion it should also be surrounded by the critical section, or indeed the InterlockedIncrement/InterlockedDecrement should be used instead. For the Locked structure this will unlikely cause a problem, but for some other usages it might. procedure TOmniCriticalSection.Acquire; begin ocsCritSect.Acquire; Inc(ocsLockCount); end; { TOmniCriticalSection.Acquire }
function TOmniCriticalSection.GetLockCount: integer; begin Result := ocsLockCount; // <--- Is it really ok? end; { TOmniCriticalSection.GetLockCount }
procedure TOmniCriticalSection.Release; begin Dec(ocsLockCount); ocsCritSect.Release; end; { TOmniCriticalSection.Release } — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.
Thanks for explanation. By the way, what about InterlockedIncrement/InterlockedDecrement for ocsLockCount? Do you know how much slower are they?
The documentation would of course be beneficial, but even following the documentation I think in theory I could still crash the Locked
If one accidentally calls GetValue/SetValue without acquiring the critical section, the ocsLockCount variable is accessed directly. There is a slight possibility, that during that time another thread will be altering the ocsLockCount while acquiring/releasing the critical section. Of course this situation is very unlikely in real world, as nobody should check the Value without Acquire/Release. So at the moment it looks just like a matter of perfection, but not a real problem.
Fixed a bug in
TOmniCriticalSection
. TheocsLockCount
was decremented after releasing the critical section. This caused assertion failures inLocked<T>.GetValue
function, when used in multithreaded environment.