In pull request #126, DS::MsgChannel was changed to lazy-initialize the eventfd. This is sensible because, in many places, DS uses temporary clients that do not need all MsgChannels available. Unfortunately, this introduces a potential race condition. This can be observed primarily in the dm_game_join handler, which pushes an e_UpdateAgeSrv message to the AuthDaemon. The auth server responds so quickly that there is a race between the GameHost calling m_channel.getMessage() and the AuthHost calling m_channel.putMessage that the two end up operating on different eventfds, causing the GameHost to hang forever. Anyone linking to that age from then on is permanently stuck in the spinning book age. Most other situations are not particularly vulnerable to this race condition because they do something slow, like querying postgres, giving the message pusher time to initialize the eventfd.
I made the fix to, by default, init the eventfd at DS::MsgChannel init, so the unsafe behavior must be opted into. In this case, the eventfd is generally requested for a poll operation before it is used.
In pull request #126,
DS::MsgChannel
was changed to lazy-initialize the eventfd. This is sensible because, in many places, DS uses temporary clients that do not need all MsgChannels available. Unfortunately, this introduces a potential race condition. This can be observed primarily in thedm_game_join
handler, which pushes ane_UpdateAgeSrv
message to the AuthDaemon. The auth server responds so quickly that there is a race between the GameHost callingm_channel.getMessage()
and the AuthHost callingm_channel.putMessage
that the two end up operating on different eventfds, causing the GameHost to hang forever. Anyone linking to that age from then on is permanently stuck in the spinning book age. Most other situations are not particularly vulnerable to this race condition because they do something slow, like querying postgres, giving the message pusher time to initialize the eventfd.I made the fix to, by default, init the eventfd at
DS::MsgChannel
init, so the unsafe behavior must be opted into. In this case, the eventfd is generally requested for apoll
operation before it is used.