Race condition in getNotification

cocreature commented 8 years ago

Alright I have tracked it down to a single line which seems to deterministically toggle a deterministic deadlock. The problem is solved by a call to threadWaitRead.

Without the threaded runtime the test fails always when I don’t call threadWaitRead, with the threaded runtime it sometimes passes. I have never seen it fail with threadWaitRead inside.

I’m sadly a bit out of ideas here. I don’t see why the call to threadWaitRead should change how the other thread calling threadWaitRead behaves.

@lpsmith Sorry for bothering you but you obviously know a lot more about this than I do, any ideas what could cause this?

cocreature commented 8 years ago

I just had the idea that maybe it is important to only call getResult when we are not busy, however if I simply call isBusy inside a loop. However isBusy keeps returning true despite the fact that getResult returns. If I interpret the docs correctly isBusy basically indicates if getResult will block so this seems really weird.

lpsmith commented 8 years ago

Yeah, this is weird. This seems like a libpq issue; libpq can be a bit opaque around the edges like this.

Then again, another possibility that there's a problem in postgresql-libpq, with some function (at least in this use case) that's inappropriately using an unsafe FFI call. That could cause this sort of problem as well.

lpsmith commented 8 years ago

Well, PQ.sendQuery and PQ.getResult are both using safe FFI calls, so that theory seems... less likely.

lpsmith commented 8 years ago

Hmm, I don't suppose another possibility is that it might be some kind of issue with GHC's runtime and/or IO manager? If this is the case, I suppose there is an outside (if unlikely) chance that testing this code on a wide range of newer and older GHCs would expose the issue.

In any case, we really need a test plan that that definitively rules out or implicates some of the possibilities. The reasonably likely possibilities being, in descending order of liklihood in my estimation:

Some obscure integration problem between the various major components involved (hasql, postgresql-libpq, libpq, GHC)
Obscure behavior of libpq
Bug in postgresql-libpq
Bug in GHC
Some kind of problem with the test itself.

What would be really nice, is to attach a debugger to a deadlocked process, and find out exactly where the instruction pointers are in all the threads. GDB should work: though two issues would be GHC's green threads, and then translating the instruction pointer back into a haskell source location. I don't know too much about GHC's debugging options, unfortunately.

I suppose a sanity check might be to wireshark the connection just to ensure the notification is actually being sent.

cocreature commented 8 years ago

I just did the sanity check and it is definitely being send, psql also shows it after running listen channel again (looks like it doesn’t do automatic polling).

I’ll look into some of your other suggestions, let’s see if I find something.

lpsmith commented 8 years ago

You could use forkOS instead of fork to force the GHC runtime to run every green thread in its own separate thread. (Note that this doesn't prevent GHC green threads from running in a bound thread, just that a given Haskell thread is always executed in a given OS thread.) That'd be one way to avoid the green thread/gdb issue. (Though I don't know much about mapping a GDB location back to a Haskell location... but who knows, maybe you get lucky and deadlock inside libpq. Or maybe you can't replicate this with forkOS, which would be very interesting indeed.)

cocreature commented 8 years ago

I tried GHC-7.8.4, 7.10.3 and 8.0.1 but I got the same results. I also tried throwing lldb at it but I ended up somewhere in the runtime which didn’t look super useful.

Now to the interesting part, I found out that the threadWaitRead in exec is not the culprit. Using threadDelay 1000 also lets the test pass successful. This brings up a thought that I had earlier and then forgot about it: Could it be the case, that data arrives on the socket, but getResult reads it before a context switch happens and when threadWaitReadSTM wakes up again no data is available?

cocreature commented 8 years ago

Checking notifies directly after getResult returns a notification which seems to confirm by suspicion that it might already consume the notification. However from what I’ve been able to get from the #ghc IR channel so far it seems like all threads waiting should get scheduled so this can’t occur.

cocreature commented 8 years ago

Having talked to slyfox (one of the GHC devs) a bit more and after looking at the strace output I’m now convinced that this is what’s causing this bug. Not yet sure what a proper fix for this looks like but at least the bug is no longer completely misterious.

lpsmith commented 8 years ago

Right, libpq will place a notification into the connection object until it's consumed. That's why on the first call, getNotification simply calls PQ.notifies, and if that doesn't work, it goes through the threadWaitReadSTM/PQ.consumeInput/PQ.notifies loop.

The thing that I'm confused on, though, is that there shouldn't be any opportunity for the connection state to change between a PQ.notifies call that returns Nothing and the call to threadWaitReadSTM, because the connection lock is held the entire time. Similiarly, the state of the kernel descriptor object shouldn't be able to change either, because of the lock.

So if your hypothesis is correct, it would seem to imply that something's manipulating either the libpq connection object or the kernel descriptor object outside the lock. But I don't see where that's happening either.

lpsmith commented 8 years ago

So yes, you did point out another race condition in the pre-GHC 7.8 getNotification code that I wasn't consciously aware of: the reliance on threadWaitRead means that it could be possible for a notification to arrive after PQconsumeInput is called, and then another thread to make the descriptor unreadable after the lock is dropped but before threadWaitRead is called. This would add unnecessary latency, and in extreme cases lead to deadlock (no descriptor index reallocation required.)

cocreature commented 8 years ago

I’m not sure I understand what exactly is confusing to you so let me give a more detailed explanation of the scenario I think we’re seing here. I’ll call the main thread that waits for the notification thread 1 and the thread sending the notification thread 2 and I’ll only talk about the single threaded RTS because I haven’t yet read the code for the multithreaded RTS.

Thread 1 is waiting using the action returned bythreadWaitReadSTM (he no longer has the lock at this point), thread 2 is waiting in threadDelay. At this point the RTS calls select on the postgresql socket with the timeout of thread 2.
The timeout is up so thread 2 is scheduled. It first acquires the lock. Then thread 2 sends the notification and calls getResult without being interrupted. Calling getResult already consumes the notification. Thread 2 then exits and drops the lock.
The RTS now goes back to thread 1, sees that it’s waiting on the postgresql socket so it goes back to calling select. However since thread 2 has already consumed all input read would block so select blocks and we have a deadlock.

One solution seems to be to require the threaded runtime and have a separate OS thread that does an FFI call to select. My understanding is that in this case the kernel guarantees that this thread will wake up. However having a separate OS thread just for waiting seems a bit overkill.

lpsmith commented 8 years ago

Ok, so what I missed is that by your current theory, the GHC IO manager is never seeing the socket in its readable state. I knew that the getResult has the potential to consume notifications. Thus, vanilla postgresql-simple doesn't have this problem because (at the moment) everything is run through the IO manager. I think I understand now.

lpsmith commented 8 years ago

In any case, I think this reinforces my position that providing a "generic" getNotification that works across all postgresql-libpq based higher-level wrappers is not a great idea. I knew this is some fairly subtle code, you've demonstrated that it's more subtle than even I realized. :-)

cocreature commented 8 years ago

In any case, I think this reinforces my position that providing a "generic" getNotification that works across all postgresql-libpq based higher-level wrappers is not a great idea.

Agreed, I’ll remove the generic stuff once I’ve figured out how to fix the bug here preferably without modifying hasql to call threadWaitRead.

lpsmith commented 8 years ago

Well, calling threadWaitRead shouldn't really be a problem (except on Windows, where this problem is not relevant, because getNotification can't use threadWaitRead either.) Adding it shouldn't slow things down by much, and could speed things up a touch.

Assuming your theory is correct, would a single call to threadWaitRead solve the problem? Consider the following scenario:

A notification arrives, waking up the threadWaitRead
Now, getResult is blocked, which seems to set up the same scenario we have now if another notification arrives

Err, that's not right, because (1) can't happen, as postgresql won't deliver notifications between a request and result; and (2) even if it did, GHC's IO manager would wake up the thread that is calling getNotification, which would then block on the MVar that the request/response cycle is holding. I can't seem to make this double-notification scenario work in my head. Maybe a single call to threadWaitRead would be enough.

lpsmith commented 8 years ago

Actually, there is another possibility, I could always add threadWaitRead into postgresql-libpq. :-D

Well, except that would then be spurious in the context of postgresql-simple. :-1:

It might be intellectually interesting to find another solution, but I think adding a call to threadWaitRead, (with a !windows #ifdef) is fine. The only possible problem is that I don't know how receptive Nikita would be to such a modification.

(Also, these sorts of niggling issues are why I think getNotification should be part of HaSQL, instead of a separate package, and why withLibPQConnection should be in a module explicitly labelled Internal, but whatever.)

lpsmith commented 8 years ago

Ok, I did think about adding threadWaitRead to exec a fair bit, and I am feeling reasonably confident in your proposed solution. I doubt it exhibits another more subtle form of deadlock.

Because the notifications are asynchronous, I was wrong about something: It is possible that an async notification can arrive between the sendQuery and getResult. It's true that the backend won't send a notification between the time it received the sendQuery and sent the response to getResult, but that doesn't imply that a notification won't arrive between the time that the client completed the sendQuery to the time it completed getResult. Of course, the probability that a sendQuery and a notification message cross each other in flight depends on the latency of the communication link, but it is possible. Similarly, it's actually quite probable that the notification is fully transmitted between the time the backend sends the result and the client finishes getResult, in cases where the client triggers the notification itself (directly or indirectly.)

So, getNotification holds the per-connection lock just long enough to check for messages, and then drops the lock and blocks on the socket. On the other side, exec holds the lock for the duration of the query. The tricky case is when the notification arrives at a point in time such that it's consumed by getResult.

So if getNotification checks for messages after exec, this wasn't a problem before, and it's not a problem with your proposed solution. getNotification will see the message that is available and return it without ever touching the socket.

The only other possibility is that getNotification checks for the message before the exec, and blocks. Before, GHC's IO manager never sees the socket become readable, because it's totally out of the loop during the exec. Thus getNotification never wakes up after the exec to check for messages again.

Now, assuming that atomically waitRead will return immediately, even if the socket isn't immediately readable, as long as the IO manager saw the socket as readable at some point between the call to threadWaitReadSTM and the call to atomically waitRead, we are good. Then exec's call to threadWaitRead is guaranteed to wake up getNotification on every query, regardless of whether or not a notification arrived.

(And I'd honestly be a bit shocked if threadWaitReadSTM did not have the semantics I described, but I suppose one should verify this by reading the source code, discussing this with others, and/or performing experiments.)

On the other hand, there is still a slight risk of deadlock with the pre-GHC-7.8 getNotification code, because if getNotification checks for notifications, drops the lock, but then doesn't get around to calling threadWaitRead until after an intervening exec completes, we could end up in the same situation, even with your proposed solution.

So sure, we could probably implement getNotification a bit more simply and more efficiently if we forked or abandoned libpq, but I think this is (close to) the best we can do if we stick with vanilla libpq.

lpsmith commented 8 years ago

So anyway, how do you feel about this project? Personally, I think we should submit another PR to HaSQL, maybe we will get further if we both support the PR...

cocreature commented 8 years ago

Thanks for the detailed analysis and sorry that I didn’t reply (got busy with finishing by Bachelor’s thesis) to your previous comment.

I don’t think having this as a separate project from hasql makes sense if it means we have to rely on subtle properties of hasql (i.e., calling threadWaitRead in exec). So making a new PR that includes the threadWaitRead as well as the actual code for getting notifications is imho the right way forward.

I’m still busy with exams for the next two weeks, so I’m not sure I’ll get to it before then. If you want to do it, I’ll happily support your PR :)

lpsmith commented 8 years ago

Oh no problem, I'll see if I can't get this turned around within a week or so. Best Wishes on your thesis!

cocreature commented 8 years ago

Best Wishes on your thesis!

Thanks!

begriffs commented 7 years ago

Hey guys, what's the status of this library? I'd like to be able to start listening to events from hasql!

cocreature commented 7 years ago

Hey, I have to admit that I don’t have a need for this library atm and I have no plans to continue working on this anytime soon. That said, feel free to pick up in the current state or whatever parts of that are useful for you.

lpsmith commented 7 years ago

My interests have also shifted, but if you read this whole thread, both cocreature and myself came to the conclusion that this should be part of hasql, and not a separate library, because the correctness of this code depends on internal implementation details of hasql's other operators.

lpsmith commented 7 years ago

Although, @begriffs, I would be willing to help prepare and/or review a PR to hasql, and to help convince Nikita if I can, to get this into hasql itself.

begriffs commented 7 years ago

@lpsmith thank you, that would be really helpful!

This feature would be the missing link that allows postgrest to reload its cache of the database structure automatically on changes (since we can create a DDL trigger that sends a NOTIFY for the server to refresh its cache). The inability to auto refresh the cache has been a source of confusion for a long time.

3noch commented 4 years ago

@jfischoff Wrote a great blog post on this bug here: https://jfischoff.github.io/blog/when-threadwaitread-doesnt.html

lpsmith commented 4 years ago

Nice job, @jfischoff. That analysis seems correct. (And skimming even this issue has taken me a little bit of time to get back into the right headspace... first time reading my own comments in so many years, they seemed like foreign gibberish to me.)

So postgresql-simple's change to asynchronous libpq in postgresql-simple-0.4.3 actually "fixes" what would have been a bug in getNotification if the getNotification from 0.5.1.3 had been introduced before the switch to asynchronous libpq... thus exposing a bit more subtlety here that I didn't fully appreciate. (Actually, now that I think about it, I wonder if this sort of race condition is present in the pre-GHC 7.8 code)

Cocreature and I were on the right track, that GHC's IO Manager was never seeing the wakeup, but I at least never stopped to think maybe it could be the semantics of epoll/kqueue passing through the IO manager. sigh.

If one is really wedded to notifications, it's not entirely bad to dedicate an entire connection to them. Certainly, from a queueing theory perspective, you have to be very judicious about what queries/transactions that are issued on the listening connection, as you don't want long running queries/transactions to add unnecessary latency to your notifications. Basically, I tried to keep them to queries/DML statements that could be run in "bare" transactions and were guaranteed to run pretty quickly and return not more than a few rows.

In some cases, the only queries I ever executed were SELECT 1/SELECT NOW() every 5-15 seconds as a type of application-level heartbeat to ensure the network connection hadn't gone out to lunch. (And, if it had, so that the application would notice and recover more quickly)

jfischoff commented 4 years ago

Cocreature and I were on the right track, that GHC's IO Manager was never seeing the wakeup, but I at least never stopped to think maybe it could be the semantics of epoll/kqueue passing through the IO manager. sigh.

Yes y'all were on the right track. If I had been able to digest the thread better it would have saved me some time in retrospect.

Take some solace in the fact that bpftrace did not exist at the time so debugging this issue would have been much more difficult :).

If one is really wedded to notifications, it's not entirely bad to dedicate an entire connection to them. Certainly, from a queueing theory perspective, you have to be very judicious about what queries/transactions that are issued on the listening connection, as you don't want long running queries/transactions to add unnecessary latency to your notifications. Basically, I tried to keep them to queries/DML statements that could be run in "bare" transactions and were guaranteed to run pretty quickly and return not more than a few rows.

Ah yeah that is a good point.

I find your implementation very appealing, because running out of postgres connection is a real issue. I think your implementation in postgresql-simple will not exhibit the bug as you mentioned in the thread but utilizing the connection multiplexing is still tricky. Something I have been mulling over but had not considered the impact of the transactions when using the connection.

lpsmith commented 4 years ago

Hmm, actually now that I think about how I was using notifications, it's maybe at least somewhat important to be able to issue queries on the listening connection. My standard loop for reliable notifications involved the following pseudocode:

To send a message:

write the message to a table
Send a notification

To receive messages:

Connect to postgres
LISTEN on the notification channel
Check the message table for unseen messages
Wait for a notification
Go to 3

I suppose that if you aren't issuing any queries on the listening connection, the correctness of the receive loop depends on the cross-connection concurrent semantics of postgres. (It's probably ok, but...) On the other hand, this loop does not require the sophisticated multiplexing-the-listening-connection-in-a-concurrent-way semantics that postgresql-simple provides. (Though, doing the heartbeat thing gets a bit trickier without postgresql-simple's getNotification multiplexing)

Yes, this stuff is also easy to mess up when writing the library: as so often happens in certain concurrent scenarios, the correctness of getNotification depends on subtle properties of everything else that interacts with libpq on the same connection. There isn't a fully legitimate layer of modularity at the libpq level.

Thus my position (largely rejected by Nikita on weird ideological grounds, last I was aware) that receiving notifications should be a core part of the library, and that while a library should probably allow for external use of libpq, such functionality should be marked internal as a way of indicating "there be dragons". (Last I was aware, he was also opposed to exposing Internal modules.)

jfischoff commented 4 years ago

Yeah I think it is okay to do steps 1-5 for the listening connectio without experiencing the recvfrom race. You might get a notification while you dequeue messages in step 3. When you restart on step 5 you have to handle the case that the notification was unnecessary.

, the correctness of the receive loop depends on the cross-connection concurrent semantics of postgres. (It's probably ok, but...)

Not exactly what this means but I have decided to trust the postgres implementation to reliably deliver notifications. I would not be surprised if there are "fun" bugs yet to discover on the postgres side but for now I am 🙉

lpsmith commented 4 years ago

Not exactly what this means but I have decided to trust the postgres implementation to reliably deliver notifications.

You can't trust anything to reliably deliver messages... e.g. networks experience failures outside the control of any software: if your client loses the db connection, and then recovers, any notifications sent during that glitch will be lost forever.

Also, what if the power fails on your notification listener, but not your database server? Etc. etc. The two generals problem is fundamental and unavoidable whenever you can experience partial failures. Maybe there's some distributed software that doesn't really care about appropriately handling such failures, but that's not the kind of software I've ever been interested in writing.

So you really cannot rely on notifications for guaranteed* at-least-once delivery. (Guaranteed, assuming that proper operation of the underlying physical systems is eventually restored)

What I meant, though, was this: let's say we have two connections running on the listening process, using the algorithm described above for robust message delivery:

Listening connection A receives a notification that something's been written to the message table
As a result, the process asks about new messages on connection B.
Assuming we get a response back, is the process guaranteed to see whatever message that was written to the table?

Of course connection B might not see the the message if it was already in a transaction, so we need to add the assumption that we create a new transaction after the notification on A is received. Now, are we able to answer question 3 affirmatively, or is it still possible to have a scenario where connection B fails to see the new message for some period of time after the notification arrives on A?

(In this scenario, my algorithm would then go to sleep until it receives another notification, after which it would "usually" see both messages... thus adding potentially unbounded latency to the first message.)

Well... this scenario is still possible in some distributed postgresql networks: e.g. connection B is on a read-only asynchronous replica. I'm sure there are other postgresql configurations where this could happen.

So, then, assuming connection A and connection B are both to the same postgresql cluster running on a single computer, is this scenario guaranteed to be impossible? It seems like it should be, otherwise it'd be rather difficult to make anything all that reliable, but I'm not sure I actually know that it's impossible either.

Whereas, my understanding of postgres is that if the query for new messages is issued on the same connection that the notification is received, then it will definitely see the new message.

cocreature / hasql-notifications

Race condition in getNotification #1