Open cocreature opened 8 years ago
I just had the idea that maybe it is important to only call getResult when we are not busy, however if I simply call isBusy
inside a loop. However isBusy
keeps returning true despite the fact that getResult
returns. If I interpret the docs correctly isBusy
basically indicates if getResult
will block so this seems really weird.
Yeah, this is weird. This seems like a libpq issue; libpq can be a bit opaque around the edges like this.
Then again, another possibility that there's a problem in postgresql-libpq, with some function (at least in this use case) that's inappropriately using an unsafe FFI call. That could cause this sort of problem as well.
Well, PQ.sendQuery
and PQ.getResult
are both using safe FFI calls, so that theory seems... less likely.
Hmm, I don't suppose another possibility is that it might be some kind of issue with GHC's runtime and/or IO manager? If this is the case, I suppose there is an outside (if unlikely) chance that testing this code on a wide range of newer and older GHCs would expose the issue.
In any case, we really need a test plan that that definitively rules out or implicates some of the possibilities. The reasonably likely possibilities being, in descending order of liklihood in my estimation:
What would be really nice, is to attach a debugger to a deadlocked process, and find out exactly where the instruction pointers are in all the threads. GDB should work: though two issues would be GHC's green threads, and then translating the instruction pointer back into a haskell source location. I don't know too much about GHC's debugging options, unfortunately.
I suppose a sanity check might be to wireshark the connection just to ensure the notification is actually being sent.
I just did the sanity check and it is definitely being send, psql
also shows it after running listen channel
again (looks like it doesn’t do automatic polling).
I’ll look into some of your other suggestions, let’s see if I find something.
You could use forkOS
instead of fork to force the GHC runtime to run every green thread in its own separate thread. (Note that this doesn't prevent GHC green threads from running in a bound thread, just that a given Haskell thread is always executed in a given OS thread.) That'd be one way to avoid the green thread/gdb issue. (Though I don't know much about mapping a GDB location back to a Haskell location... but who knows, maybe you get lucky and deadlock inside libpq. Or maybe you can't replicate this with forkOS
, which would be very interesting indeed.)
I tried GHC-7.8.4, 7.10.3 and 8.0.1 but I got the same results. I also tried throwing lldb
at it but I ended up somewhere in the runtime which didn’t look super useful.
Now to the interesting part, I found out that the threadWaitRead
in exec
is not the culprit. Using threadDelay 1000
also lets the test pass successful. This brings up a thought that I had earlier and then forgot about it: Could it be the case, that data arrives on the socket, but getResult
reads it before a context switch happens and when threadWaitReadSTM
wakes up again no data is available?
Checking notifies
directly after getResult
returns a notification which seems to confirm by suspicion that it might already consume the notification. However from what I’ve been able to get from the #ghc
IR channel so far it seems like all threads waiting should get scheduled so this can’t occur.
Having talked to slyfox
(one of the GHC devs) a bit more and after looking at the strace output I’m now convinced that this is what’s causing this bug. Not yet sure what a proper fix for this looks like but at least the bug is no longer completely misterious.
Right, libpq will place a notification into the connection object until it's consumed. That's why on the first call, getNotification
simply calls PQ.notifies
, and if that doesn't work, it goes through the threadWaitReadSTM
/PQ.consumeInput
/PQ.notifies
loop.
The thing that I'm confused on, though, is that there shouldn't be any opportunity for the connection state to change between a PQ.notifies
call that returns Nothing
and the call to threadWaitReadSTM
, because the connection lock is held the entire time. Similiarly, the state of the kernel descriptor object shouldn't be able to change either, because of the lock.
So if your hypothesis is correct, it would seem to imply that something's manipulating either the libpq connection object or the kernel descriptor object outside the lock. But I don't see where that's happening either.
So yes, you did point out another race condition in the pre-GHC 7.8 getNotification
code that I wasn't consciously aware of: the reliance on threadWaitRead
means that it could be possible for a notification to arrive after PQconsumeInput
is called, and then another thread to make the descriptor unreadable after the lock is dropped but before threadWaitRead
is called. This would add unnecessary latency, and in extreme cases lead to deadlock (no descriptor index reallocation required.)
I’m not sure I understand what exactly is confusing to you so let me give a more detailed explanation of the scenario I think we’re seing here. I’ll call the main thread that waits for the notification thread 1
and the thread sending the notification thread 2
and I’ll only talk about the single threaded RTS because I haven’t yet read the code for the multithreaded RTS.
threadWaitReadSTM
(he no longer has the lock at this point), thread 2 is waiting in threadDelay
. At this point the RTS calls select
on the postgresql socket with the timeout of thread 2.getResult
without being interrupted. Calling getResult
already consumes the notification. Thread 2 then exits and drops the lock.select
. However since thread 2 has already consumed all input read
would block so select
blocks and we have a deadlock.One solution seems to be to require the threaded runtime and have a separate OS thread that does an FFI call to select
. My understanding is that in this case the kernel guarantees that this thread will wake up. However having a separate OS thread just for waiting seems a bit overkill.
Ok, so what I missed is that by your current theory, the GHC IO manager is never seeing the socket in its readable state. I knew that the getResult
has the potential to consume notifications. Thus, vanilla postgresql-simple doesn't have this problem because (at the moment) everything is run through the IO manager. I think I understand now.
In any case, I think this reinforces my position that providing a "generic" getNotification
that works across all postgresql-libpq
based higher-level wrappers is not a great idea. I knew this is some fairly subtle code, you've demonstrated that it's more subtle than even I realized. :-)
In any case, I think this reinforces my position that providing a "generic" getNotification that works across all postgresql-libpq based higher-level wrappers is not a great idea.
Agreed, I’ll remove the generic stuff once I’ve figured out how to fix the bug here preferably without modifying hasql to call threadWaitRead
.
Well, calling threadWaitRead
shouldn't really be a problem (except on Windows, where this problem is not relevant, because getNotification
can't use threadWaitRead
either.) Adding it shouldn't slow things down by much, and could speed things up a touch.
Assuming your theory is correct, would a single call to threadWaitRead
solve the problem? Consider the following scenario:
threadWaitRead
getResult
is blocked, which seems to set up the same scenario we have now if another notification arrivesErr, that's not right, because (1) can't happen, as postgresql won't deliver notifications between a request and result; and (2) even if it did, GHC's IO manager would wake up the thread that is calling getNotification
, which would then block on the MVar
that the request/response cycle is holding. I can't seem to make this double-notification scenario work in my head. Maybe a single call to threadWaitRead
would be enough.
Actually, there is another possibility, I could always add threadWaitRead
into postgresql-libpq
. :-D
Well, except that would then be spurious in the context of postgresql-simple. :-1:
It might be intellectually interesting to find another solution, but I think adding a call to threadWaitRead
, (with a !windows #ifdef) is fine. The only possible problem is that I don't know how receptive Nikita would be to such a modification.
(Also, these sorts of niggling issues are why I think getNotification
should be part of HaSQL
, instead of a separate package, and why withLibPQConnection
should be in a module explicitly labelled Internal
, but whatever.)
Ok, I did think about adding threadWaitRead
to exec
a fair bit, and I am feeling reasonably confident in your proposed solution. I doubt it exhibits another more subtle form of deadlock.
Because the notifications are asynchronous, I was wrong about something: It is possible that an async notification can arrive between the sendQuery
and getResult
. It's true that the backend won't send a notification between the time it received the sendQuery
and sent the response to getResult
, but that doesn't imply that a notification won't arrive between the time that the client completed the sendQuery
to the time it completed getResult
. Of course, the probability that a sendQuery
and a notification message cross each other in flight depends on the latency of the communication link, but it is possible. Similarly, it's actually quite probable that the notification is fully transmitted between the time the backend sends the result and the client finishes getResult
, in cases where the client triggers the notification itself (directly or indirectly.)
So, getNotification
holds the per-connection lock just long enough to check for messages, and then drops the lock and blocks on the socket. On the other side, exec
holds the lock for the duration of the query. The tricky case is when the notification arrives at a point in time such that it's consumed by getResult
.
So if getNotification
checks for messages after exec
, this wasn't a problem before, and it's not a problem with your proposed solution. getNotification
will see the message that is available and return it without ever touching the socket.
The only other possibility is that getNotification
checks for the message before the exec
, and blocks. Before, GHC's IO manager never sees the socket become readable, because it's totally out of the loop during the exec
. Thus getNotification
never wakes up after the exec
to check for messages again.
Now, assuming that atomically waitRead
will return immediately, even if the socket isn't immediately readable, as long as the IO manager saw the socket as readable at some point between the call to threadWaitReadSTM
and the call to atomically waitRead
, we are good. Then exec
's call to threadWaitRead
is guaranteed to wake up getNotification
on every query, regardless of whether or not a notification arrived.
(And I'd honestly be a bit shocked if threadWaitReadSTM
did not have the semantics I described, but I suppose one should verify this by reading the source code, discussing this with others, and/or performing experiments.)
On the other hand, there is still a slight risk of deadlock with the pre-GHC-7.8 getNotification
code, because if getNotification
checks for notifications, drops the lock, but then doesn't get around to calling threadWaitRead
until after an intervening exec
completes, we could end up in the same situation, even with your proposed solution.
So sure, we could probably implement getNotification
a bit more simply and more efficiently if we forked or abandoned libpq
, but I think this is (close to) the best we can do if we stick with vanilla libpq
.
So anyway, how do you feel about this project? Personally, I think we should submit another PR to HaSQL
, maybe we will get further if we both support the PR...
Thanks for the detailed analysis and sorry that I didn’t reply (got busy with finishing by Bachelor’s thesis) to your previous comment.
I don’t think having this as a separate project from hasql
makes sense if it means we have to rely on subtle properties of hasql
(i.e., calling threadWaitRead
in exec
). So making a new PR that includes the threadWaitRead as well as the actual code for getting notifications is imho the right way forward.
I’m still busy with exams for the next two weeks, so I’m not sure I’ll get to it before then. If you want to do it, I’ll happily support your PR :)
Oh no problem, I'll see if I can't get this turned around within a week or so. Best Wishes on your thesis!
Best Wishes on your thesis!
Thanks!
Hey guys, what's the status of this library? I'd like to be able to start listening to events from hasql!
Hey, I have to admit that I don’t have a need for this library atm and I have no plans to continue working on this anytime soon. That said, feel free to pick up in the current state or whatever parts of that are useful for you.
My interests have also shifted, but if you read this whole thread, both cocreature and myself came to the conclusion that this should be part of hasql, and not a separate library, because the correctness of this code depends on internal implementation details of hasql's other operators.
Although, @begriffs, I would be willing to help prepare and/or review a PR to hasql, and to help convince Nikita if I can, to get this into hasql itself.
@lpsmith thank you, that would be really helpful!
This feature would be the missing link that allows postgrest to reload its cache of the database structure automatically on changes (since we can create a DDL trigger that sends a NOTIFY for the server to refresh its cache). The inability to auto refresh the cache has been a source of confusion for a long time.
@jfischoff Wrote a great blog post on this bug here: https://jfischoff.github.io/blog/when-threadwaitread-doesnt.html
Nice job, @jfischoff. That analysis seems correct. (And skimming even this issue has taken me a little bit of time to get back into the right headspace... first time reading my own comments in so many years, they seemed like foreign gibberish to me.)
So postgresql-simple's change to asynchronous libpq in postgresql-simple-0.4.3 actually "fixes" what would have been a bug in getNotification
if the getNotification
from 0.5.1.3 had been introduced before the switch to asynchronous libpq... thus exposing a bit more subtlety here that I didn't fully appreciate. (Actually, now that I think about it, I wonder if this sort of race condition is present in the pre-GHC 7.8 code)
Cocreature and I were on the right track, that GHC's IO Manager was never seeing the wakeup, but I at least never stopped to think maybe it could be the semantics of epoll/kqueue passing through the IO manager. sigh.
If one is really wedded to notifications, it's not entirely bad to dedicate an entire connection to them. Certainly, from a queueing theory perspective, you have to be very judicious about what queries/transactions that are issued on the listening connection, as you don't want long running queries/transactions to add unnecessary latency to your notifications. Basically, I tried to keep them to queries/DML statements that could be run in "bare" transactions and were guaranteed to run pretty quickly and return not more than a few rows.
In some cases, the only queries I ever executed were SELECT 1
/SELECT NOW()
every 5-15 seconds as a type of application-level heartbeat to ensure the network connection hadn't gone out to lunch. (And, if it had, so that the application would notice and recover more quickly)
Cocreature and I were on the right track, that GHC's IO Manager was never seeing the wakeup, but I at least never stopped to think maybe it could be the semantics of epoll/kqueue passing through the IO manager. sigh.
Yes y'all were on the right track. If I had been able to digest the thread better it would have saved me some time in retrospect.
Take some solace in the fact that bpftrace
did not exist at the time so debugging this issue would have been much more difficult :).
If one is really wedded to notifications, it's not entirely bad to dedicate an entire connection to them. Certainly, from a queueing theory perspective, you have to be very judicious about what queries/transactions that are issued on the listening connection, as you don't want long running queries/transactions to add unnecessary latency to your notifications. Basically, I tried to keep them to queries/DML statements that could be run in "bare" transactions and were guaranteed to run pretty quickly and return not more than a few rows.
Ah yeah that is a good point.
I find your implementation very appealing, because running out of postgres connection is a real issue. I think your implementation in postgresql-simple
will not exhibit the bug as you mentioned in the thread but utilizing the connection multiplexing is still tricky. Something I have been mulling over but had not considered the impact of the transactions when using the connection.
Hmm, actually now that I think about how I was using notifications, it's maybe at least somewhat important to be able to issue queries on the listening connection. My standard loop for reliable notifications involved the following pseudocode:
To send a message:
To receive messages:
LISTEN
on the notification channelI suppose that if you aren't issuing any queries on the listening connection, the correctness of the receive loop depends on the cross-connection concurrent semantics of postgres. (It's probably ok, but...) On the other hand, this loop does not require the sophisticated multiplexing-the-listening-connection-in-a-concurrent-way semantics that postgresql-simple provides. (Though, doing the heartbeat thing gets a bit trickier without postgresql-simple's getNotification multiplexing)
Yes, this stuff is also easy to mess up when writing the library: as so often happens in certain concurrent scenarios, the correctness of getNotification
depends on subtle properties of everything else that interacts with libpq on the same connection. There isn't a fully legitimate layer of modularity at the libpq level.
Thus my position (largely rejected by Nikita on weird ideological grounds, last I was aware) that receiving notifications should be a core part of the library, and that while a library should probably allow for external use of libpq, such functionality should be marked internal as a way of indicating "there be dragons". (Last I was aware, he was also opposed to exposing Internal modules.)
Yeah I think it is okay to do steps 1-5 for the listening connectio without experiencing the recvfrom
race. You might get a notification while you dequeue messages in step 3. When you restart on step 5 you have to handle the case that the notification was unnecessary.
, the correctness of the receive loop depends on the cross-connection concurrent semantics of postgres. (It's probably ok, but...)
Not exactly what this means but I have decided to trust the postgres implementation to reliably deliver notifications. I would not be surprised if there are "fun" bugs yet to discover on the postgres side but for now I am 🙉
Not exactly what this means but I have decided to trust the postgres implementation to reliably deliver notifications.
You can't trust anything to reliably deliver messages... e.g. networks experience failures outside the control of any software: if your client loses the db connection, and then recovers, any notifications sent during that glitch will be lost forever.
Also, what if the power fails on your notification listener, but not your database server? Etc. etc. The two generals problem is fundamental and unavoidable whenever you can experience partial failures. Maybe there's some distributed software that doesn't really care about appropriately handling such failures, but that's not the kind of software I've ever been interested in writing.
So you really cannot rely on notifications for guaranteed* at-least-once delivery. (Guaranteed, assuming that proper operation of the underlying physical systems is eventually restored)
What I meant, though, was this: let's say we have two connections running on the listening process, using the algorithm described above for robust message delivery:
Of course connection B might not see the the message if it was already in a transaction, so we need to add the assumption that we create a new transaction after the notification on A is received. Now, are we able to answer question 3 affirmatively, or is it still possible to have a scenario where connection B fails to see the new message for some period of time after the notification arrives on A?
(In this scenario, my algorithm would then go to sleep until it receives another notification, after which it would "usually" see both messages... thus adding potentially unbounded latency to the first message.)
Well... this scenario is still possible in some distributed postgresql networks: e.g. connection B is on a read-only asynchronous replica. I'm sure there are other postgresql configurations where this could happen.
So, then, assuming connection A and connection B are both to the same postgresql cluster running on a single computer, is this scenario guaranteed to be impossible? It seems like it should be, otherwise it'd be rather difficult to make anything all that reliable, but I'm not sure I actually know that it's impossible either.
Whereas, my understanding of postgres is that if the query for new messages is issued on the same connection that the notification is received, then it will definitely see the new message.
Alright I have tracked it down to a single line which seems to deterministically toggle a deterministic deadlock. The problem is solved by a call to threadWaitRead.
Without the threaded runtime the test fails always when I don’t call
threadWaitRead
, with the threaded runtime it sometimes passes. I have never seen it fail withthreadWaitRead
inside.I’m sadly a bit out of ideas here. I don’t see why the call to
threadWaitRead
should change how the other thread callingthreadWaitRead
behaves.@lpsmith Sorry for bothering you but you obviously know a lot more about this than I do, any ideas what could cause this?