Closed maurobringolf closed 3 years ago
mkC2hRefCount
) but not contexts is enough to trigger segfaults.Nice work! I think your observations are all in line with my expectations. Decreasing the ref count for an object may be modifying the context itself, so one cannot do these operations concurrently. I still believe that introducing a context lock and using it to serialize finalizers may fix the issue. We already take care of not freeing the context before its objects.
Do you want to give this a try or should I?
I thought the same and tried a few variations of locking Context
. I could not get it to work by locking only inside the finalizers, so I ended up locking inside withContext
as well. To do that I needed to change to the re-entrant version RLock
, otherwise it deadlocks instantly.
The current version passes the test but is clearly undesirable because it forces complete serialization and has way too many locking operations. Do you have any ideas?
I think https://github.com/IagoAbal/haskell-z3/issues/27#issuecomment-792654862 is not applicable, but maybe I am wrong.
Does it fix the segfaults?
It does, for this test case at least. I am a bit concerned about locking the context on every API call, don't you think this is a problem?
If we decide to use this solution I would include some documentation in the readme about thread safety and locking, because otherwise someone writing concurrent code might end up getting serialized by our library without realizing it (If I am understanding it correctly).
As far as I understood Z3 contexts are not thread-safe so if someone is trying to manipulate the same context concurrently is doing it wrong anyways. In our case the GC is doing exactly that and hence people get segfaults (c.f., #27). But we don't need to go this far, it's good enough to serialize the finalizers and document that Z3.Base.Context
is not thread safe.
t's good enough to serialize the finalizers
Based on my tests it is not: the testcase still segfaults. According to my understanding this depends whether GC/finalizers ever run concurrently with the program itself. I found some references to concurrent garbage collection in the GHC docs, so I think we need to control for that.
Based on my tests it is not: the testcase still segfaults. According to my understanding this depends whether GC/finalizers ever run concurrently with the program itself.
That is weird! The GC "stops the world" by default, meaning the program doesn't run while the GC runs. I now see the --nonmoving-gc
option, but it's not the default and I don't see you enabling it.
It turns out that -N
implies --nonmoving-gc
from the docs:
Setting -N also has the effect of enabling the parallel garbage collector
and also:
Under this collection strategy oldest-generation garbage collection can proceed concurrently with mutation.
So the compiler options in this PR and bug reports result in concurrent garbage collection, which results in concurrent operations on the not-thread-safe context objects. When adding --copying-gc
to this PR's configuration it runs fine without any locks. As far as I understand the docs this is not guaranteed, because finalizers might still run in concurrently. But if we lock just the finalizers, then we should be safe in this GC.
So have three ideas where we can go from here:
Context
option without garbage collection, basically an API for something like https://github.com/IagoAbal/haskell-z3/issues/27#issuecomment-793055118. We could then document that this should be used when using concurrent GC.--nonmoving-gc
and only lock finalizers.I am trying to figure out whether option 3 is feasible, because I think it makes the most sense.
I believe that parallel GC may refer to both the copying GC and the "non moving" GC. I am a bit surprised about this behavior because copying is supposed to be the default. But anyhow, let's assume the worst case scenario here.
I am more fond of option 1, I would just like to avoid the dependency with concurrent-extra. (I would be happy to depend on it if it were actively maintained, but it doesn't seem to be.) We only need a minimal implementation of RLock
on top of MVar
, one can basically copy the way it's done in concurrent-extra. I can help with that if you want.
One could also rely on a single MVar
, but then we would have to be very careful not introducing potential deadlocks. Perhaps some code would have to be slightly restructured. So overall it may be more work than implementing an RLock
and long-term a potential maintenance headache.
Okay, I think opting for the safest version is a good idea. So I adopted Lock
and RLock
and kept all synchronization in place. It is probably not the best solution yet, but worth merging to get rid of the segfaults.
Maybe I will try to create a multi-threaded example for further testing and documentation.
Fixed the copy paste mistakes, thanks for catching!
Is the new Concurrency section in the readme understandable?
A new release is a good idea, in the readme I referred to it as 408.3
is that correct?
Thanks for the great work!
BTW, I'm planning to release 408.3 this coming weekend, perhaps on Sunday.
This was great, I learned a lot doing this!
A release with this fix is a good idea. The next thing I planned to tackle was some improvements in compiler compatibility such that there are no warnings with any GHC version (or just less, I currently get quite a number of them) and also testing it with the new GHC version 9. I think I will have time for this before Sunday, but if not it is not critical at all and can go in a later release.
Sorry I've been quite busy, hope to release this by next weekend...
No worries, I have been working on more testing and further API extensions in the meantime.
I created a reproduction for #27 which works reliably on my machine and GitHub actions so far. It crashes with different errors though, here are some examples: