Gabriella439 / pipes-concurrency

Concurrency for the pipes ecosystem
BSD 3-Clause "New" or "Revised" License
43 stars 12 forks source link

Problems with the weak references / GC trick on GHC 7.8 #29

Open jberryman opened 9 years ago

jberryman commented 9 years ago

I thought I would open this ticket for the known issue mentioned in #28 , mostly because I'm selfishly interested in figuring out what the actual issue is.

ibotty commented 9 years ago

According to tekmo's mail from the 26th of January it's only safe for now to use the (not yet released) withSpawn. Can you (tekmo) please release it? I figure that the tutorial also needs updating. It would also be nice to statically prohibit the GC trick.

I might have time to update the tutorial and look through the API. But I'm not a native speaker and only vaguely familiar with pipes-concurrent.

Gabriella439 commented 9 years ago

Yeah, I will go ahead and remove the GC trick from the API, then. Even if this were fixed today, it would be in at least GHC-7.12, and it would take a long time for that to propagate to a large enough number of people. I just need to bite the bullet and fix the documentation.

Gabriella439 commented 9 years ago

Oops, I forgot to mention that I will release the withSpawn trick soon. I usually buffer updates a month apart by default unless I get a specific request to cut a release earlier. Since you requested I will try to upload withSpawn to Hackage soon (probably tonight).

k0001 commented 9 years ago

I actually have a request before the new withSpawn is released.

Its current type is:

withSpawn :: Buffer a -> (Output a -> Input a -> IO r) -> IO r

I think it would be best to change its type to the follwing, so that it can be used together with Control.Monad.Managed.managed:

withSpawn :: Buffer a -> ((Output a, Input a) -> IO r) -> IO r

Then one can do:

(output, input) <- managed $ withSpawn myBuffer

Using withSpawn without managed won't change much with this new type. Compare this current version:

withSpawn myBuffer $ \output input -> ...

With this new version:

withSpawn myBuffer $ \(output, input) -> ...

If you agree that this is a good change to do, I can send a pull-request.

Gabriella439 commented 9 years ago

Actually, I'm planning on changing it to use Managed anyway before the release.

k0001 commented 9 years ago

@Gabriel439 oops, sorry, I sent the pull-request before reading your comment here. You are quick to respond! Thanks!

If I might suggest otherwise, I think forcing the use of Managed is perhaps not a great idea given that converting the version of withSpawn I propsed to a Managed (Output a, Input a) is so easy using managed. As a user, in simple cases such as this one, I like to think as Managed as an opt-in feature. That is, I would like to be able to use managed if I had a compelling reason to do it, but otherwise I'd prefer to use the traditional withXXX style and not worry about mixing different monads and additional package dependencies.

Having said that, I wouldn't mind if this was Managed by default—though I would prefer it wasn't.

Gabriella439 commented 9 years ago

Alright, then I'll go back to the original withSpawn, except with a tuple argument for the callback

michaelt commented 9 years ago

I went through the tutorial examples trying to see if spawn' (@k0001's spawn) worked for all of them, and also trying @k0001 's withSpawn . All but one of the examples work, I think, replacing performGC with atomically seal. The bad case is main3 and main3' at line 337 in this mess https://gist.github.com/michaelt/7f7b07cf7db222a5207d Here, 'the workers are on strike' and each worker proposes to take two tasks before the buffer closes down. But if we mechanically use atomically seal of course the whole thing closes when any of them accepts a second task, rather than waiting for each to take a second task. I guess this makes the attraction of performGC a little clearer to me. (atomically seal is much clearer, indeed it is transparent why the program main3 is acting 'correctly')

I couldn't figure out how withSpawn is supposed to be used. The dim-witted employment in cases where there are two asynchronous pipelines around inevitably leads to one pipeline continuing after the other has stopped or else to an stm exception.

PierreR commented 9 years ago

If I might suggest otherwise, I think forcing the use of Managed is perhaps not a great idea given that converting the version of withSpawn I propsed to a Managed (Output a, Input a) is so easy using managed.

Personally as an outsider I don't see it ... When and why is it better to use managed withSpawn ?

Gabriella439 commented 9 years ago

Actually, I was considering going with Managed after all. I think it's proven itself stable enough that it's worth adding as a dependency.

The route I'm actually considering is to first fork off the pipes-independent stuff in a separate concur library that uses Managed and then switch pipes-concurrency to reuse that library.

michaelt commented 9 years ago

More use of managed sounds great in principle. Is there an illustration of the correct use of 'withSpawn' around somewhere, though? In the typical tutorial examples, performGC and atomically seal are necessarily used more than once, so that different pipelines can close down the buffer, emulating the pipes convention that any pipe can close down a pipeline. It is hard to see how withSpawn could rationally have more than one pipeline in its scope, but I think I mustn't understand it.

Gabriella439 commented 9 years ago

Basically, it would look like this (defined in terms of the existing API to simplify this discussion):

spawn2 :: Buffer a -> Managed (Output a, Input a, STM a)
spawn2 buffer = managed (\k ->
    bracket (spawn' buffer) (\(_, _, s) -> atomically s) (\(o, i, _) -> k (o, i) )

In other words, it would automatically call seal for you. You could also provide an alternative formulation that lets the user seal the channel themselves.

michaelt commented 9 years ago

Right so

main = runManaged $ do
   (output, input, seal) <- spawn_managed unbounded
   liftIO $ do 
     as <- forM [1..3] $ \i ->
           async $ do runEffect $ fromInput input  >-> worker i
                      atomically seal
     a  <- async $ do runEffect $ each [1..10] >-> toOutput output
                      atomically seal
     mapM_ wait (a:as)
 where
   worker :: (Show a) => Int -> Consumer a IO r
   worker i = forever $ do
        a <- await
        lift $ threadDelay 1000000  -- 1 second
        lift $ putStrLn $ "Worker #" ++ show i ++ ": Processed " ++ show a

spawn_managed :: Buffer a -> Managed (Output a, Input a, STM ())
spawn_managed buffer = managed $ \k -> 
      bracket (spawn buffer)
              (\(_, _, s) -> atomically s)
              (\(o, i, s) -> k (o, i, s) )

evades the trouble withSpawn seems to have in main1' on line 244 of the damaged version of the tutorial I linked. (Of course above managed isn't showing much of its merit.) The trouble with the protesting workers on line 337 would still be present, but maybe it's is just a matter of reconceiving the program, which was made very simple by the performGC business.

Gabriella439 commented 9 years ago

Yeah, old performGC approach was more powerful, so not all the old use cases will be convertible to thew new approach, unfortunately.