adinapoli / threads-supervisor

Simple, IO-based Haskell library for Erlang-inspired thread supervisors
MIT License
29 stars 4 forks source link

proof of concept: shutdown supervisor via mailbox command #26

Open srijs opened 8 years ago

srijs commented 8 years ago

Not intending to merge this yet, purely to illustrate a possible solution to the problem outlined in #5.

coveralls commented 8 years ago

Coverage Status

Coverage increased (+0.6%) to 86.25% when pulling ec58921f0702929e68182e3ca84cccefe582fb4d on proof-of-concept/shutdown-messages into 79e3e77d69a12d293eaad5a1573996544536aac6 on master.

adinapoli commented 8 years ago

Ops! This PR seems to have triggered a regression in the test suite (or simply our test suite is not very good and this is a transient failure). In any case, these are my sparse comments (typed at 8:26AM, where I'm still half-asleep, so ignore the nonsense, if any):

So, in a nutshell:

  1. Can you walk me through the benefits of the current PR? (I'm totally accepting "it's still a poc, so no added benefit is presented" as an aswer)
  2. Any clue on why the test suite fails?
  3. Do you agree with my analysis on #5? Are we on the same page or am I struggling to see some subtle detail?

Thanks!

srijs commented 8 years ago

Hi @adinapoli!

  1. Let me attempt a proof by construction to show that the concrete #5 case is still an issue with epochs. You obviously know the internals better than me, so you might be able to disprove me here:

    1. Epoch 0: spawn a supervised child, with epoch=0;
    2. Epoch 1: child thread throws exception;
    3. Epoch 2: supervised sends DeadLetter, with epoch=2;
    4. Epoch 3: on other thread shutdownSupervisor kills the child (note: the child is already dead, so nothing happens);
    5. Epoch 4: now monitoring thread (in handleEvents) receives the DeadLetter with epoch=2. Since the child has not been respawned (but only killed) the child's epoch hasn't been updated, and is still epoch=0. Since the childs epoch is lower than the dead letter's, the monitor respawns the child, with epoch=4;
    6. Epoch 5: shutdownSupervisor kills the monitoring thread (doesn't consider epochs at all for this task)
    7. Epoch 6: we have an orphan child

    It may be possible to fix the issue by making the shutdownSupervisor aware of epochs as well, but since we already defined a more general solution to data race conditions, I figured we might just as well solve it directly like that.

    How I imagine it working with the queued Shutdown instruction:

    1. Epoch 0: spawn a supervised child, with epoch=0;
    2. Epoch 1: child thread throws exception;
    3. Epoch 2: supervised sends DeadLetter, with epoch=2; queue looks like: [DeadLetter]
    4. Epoch 3: on other thread shutdownSupervisor queues a Shutdown instruction; queue looks like: [DeadLetter, Shutdown]
    5. Epoch 4: now monitoring thread (in handleEvents) receives the DeadLetter with epoch=2. Since the childs epoch is lower than the dead letter's, the monitor respawns the child, with epoch=4; queue looks like: [Shutdown]
    6. Epoch 5: the monitoring thread receives the Shutdown instruction and kills all children and the monitoring thread; queue looks like: []
    7. Epoch 6: all children and the supervisor are dead

    What would happen with a queued Shutdown instruction, if the order of steps 3 and 4 is reversed:

    1. Epoch 0: spawn a supervised child, with epoch=0;
    2. Epoch 1: child thread throws exception;
    3. Epoch 2: on other thread shutdownSupervisor queues a Shutdown instruction; queue looks like: [ Shutdown]
    4. Epoch 3: supervised sends DeadLetter, with epoch=3; queue looks like: [Shutdown, DeadLetter]
    5. Epoch 4: the monitoring thread receives the Shutdown instruction and kills all children and the monitoring thread; queue looks like: [DeadLetter]
    6. Epoch 5: the monitoring thread stops the handleEvents loop
    7. Epoch 6: all children and the supervisor are dead
  2. I suspect we now might need to route more things through the channel, not just the shutdown, since it is no longer instantaneous. I'm hoping to have some time on the weekend to play with that.
adinapoli commented 8 years ago

Hey @srijs , thank you very much for this! It was exactly the kind of proof of construction I needed to see things clearly. Sorry if I gave the impression of not play it constructively here, but I think such "rubber duck debugging" are super useful and I feel we are really on the same page. I now see the value of your proposal!