Closed k0001 closed 11 years ago
Actually, the GHC RTS was behaving as expected, but I was testing my program in the wrong way and that's why I couldn't reproduce what I was saying in my above comment.
The following example exits with status code 30
when a downstream program stops consuming input from standard output:
module Main (main) where
import qualified Control.Exception as Ex
import Control.Monad (forever)
import Foreign.C.Error (Errno(Errno), ePIPE)
import qualified GHC.IO.Exception as G
import System.Exit (exitWith, ExitCode(ExitFailure))
import qualified System.Posix as Posix
main = do
-- If I uncomment the following line and a downstream program stops
-- consuming from standard output, the program exits with status code 30.
-- This is the default behavior in GHC.
--
Posix.installHandler Posix.sigPIPE Posix.Ignore Nothing
-- If I uncomment the following line and a downstream program stops
-- consuming from standard output, the program is terminated by SIGPIPE.
--
-- Posix.installHandler Posix.sigPIPE Posix.Default Nothing
forever $ do
Ex.catch (putStrLn "Hello")
(\e@(G.IOError{ G.ioe_type = G.ResourceVanished
, G.ioe_errno = Just ioe }) -> do
if Errno ioe == ePIPE
then exitWith (ExitFailure 30)
else Ex.throwIO e)
What I propose is that only in this case we should gracefully exit a consumer writing to standard output. Otherwise, if we are writing to a different output Handle
, we should let the exception propagate.
I would like to keep stdin
and stdout
because they make excellent teaching examples for what are non-trivial use cases for Producer
and Consumer
(i.e. inputs and outputs that can fail and terminate). The tutorial does not flow well without them.
Also, you can't replace stdin
with lift getLine >~ p
because getLine
throws an unwanted exception on end of input.
Remember that stdin
and stdout
exist purely for demonstration purposes and will never be used in production code. I'd rather just include an advisory alongside the Pipes.ByteString
code saying that the bytestring version of stdin
and stdout
preserves newlines, unlike the Pipes.Prelude
versions. I'll also include a similar advisory in Pipes.Prelude
.
The weird behavior you encountered is because GHC ignores EPIPE exceptions in its top-level exception handler. The exception will still show up if you install your own exception handler, though.
Even for production code I still think it's a good idea to gracefully terminate on EPIPE
. It makes for a good default behavior (and GHC's top-level exception handler agrees) and if people want to diagnose and handle it, then it's as simple as instrumenting the return values to indicate which pipe failed, the same way that the tutorial does. If you need to include descriptive information about the nature of the failure, just include it in the return value.
Just because exceptions are the current state of the art for handling stream failures doesn't mean that they are necessarily the best way. I think that in the context of pipes
the return value is a more appropriate way to indicate and handle stream failures.
Also, I think special-casing stdout
is a recipe for quirks.
I would like to keep stdin and stdout because they make excellent teaching examples for what are non-trivial use cases for Producer and Consumer (i.e. inputs and outputs that can fail and terminate). The tutorial does not flow well without them.
Also, you can't replace stdin with lift getLine >~ p because getLine throws an unwanted exception on end of input.
I agree that these are useful for demonstration purposes, it's OK to keep these or similar around.
Remember that stdin and stdout exist purely for demonstration purposes and will never be used in production code. I'd rather just include an advisory alongside the Pipes.ByteString code saying that the bytestring version of stdin and stdout preserves newlines, unlike the Pipes.Prelude versions. I'll also include a similar advisory in Pipes.Prelude.
Would you consider renaming them to stdinLn
and stdoutLn
?
The weird behavior you encountered is because GHC ignores EPIPE exceptions in its top-level exception handler. The exception will still show up if you install your own exception handler, though.
I didn't encounter a “weird” behavior; I encountered precisely the behavior I was expecting to get. At first I failed to execute my example correctly, but that's all.
Even for production code I still think it's a good idea to gracefully terminate on EPIPE. It makes for a good default behavior (and GHC's top-level exception handler agrees) and if people want to diagnose and handle it, then it's as simple as instrumenting the return values to indicate which pipe failed, the same way that the tutorial does. If you need to include descriptive information about the nature of the failure, just include it in the return value.
Just because exceptions are the current state of the art for handling stream failures doesn't mean that they are necessarily the best way. I think that in the context of pipes the return value is a more appropriate way to indicate and handle stream failures.
Also, I think special-casing stdout is a recipe for quirks.
The problem is that standard output is already special-cased by the operating system: getting EPIPE when writing to standard output doesn't mean the same than getting EPIPE when writing to another output Handle
, and I don't think we should simply ignore that. The former is an expected scenario in which you can gracefully stop executing your program, but the latter means you had a serious connectivity issue with your output Handle
.
And yes, it is a recipe for quirks, but it is the recipe the UNIX/POSIX guys left us :-(
I would still like to keep the stdin
and stdout
names just to keep Pipes.Prelude
as appealing as possible. I'd prefer to just put in an advisory in the haddocks that these drop and add newlines, respectively.
Regarding EPIPE
on Handle
s, think of it from the perspective of a user (like me) who prefers the termination behavior. If you don't provide a Consumer
that terminates on EPIPE
you force that user to write it themselves, and it is a non-trivial piece of code. In contrast, its very easy for anybody to get the behavior that treats EPIPE
as an exception just by using for
+ the underlying write function.
So for now I will keep the current behavior of stdin
and stdout
to be line-based and I added a note to the documentation explaining how their behavior differs from the upcoming ByteString
and Text
utilities. Also, if people don't like this behavior, it incentivizes them to switch to using ByteString
or Text
.
I'm also keeping the EPIPE
behavior because the behavior that ignores EPIPE
is very easy to implement using for
(and is already taught in the tutorial, too).
I still don't think that the current behavior in case of EPIPE
is right. It is OK in the case of writing to standard output, but if the Handle
you pass to Pipes.Prelude.toHandle
is, say, the Handle
that underlies a network Socket
, then just gracefully stopping is not OK. It's probably not super important in Pipes.Prelude
because I don't expect these functions to be used much beyond simple examples; but I don't think the current approach should be reproduced in the upcoming pipes-bytestring
and pipes-text
.
Oh, and the documentation bits about added newlines are quite welcomed! Thanks!
Right, but if you don't want to gracefully stop all you have to do is use a for
loop instead:
for mySource (lift . sendAll socket)
However, if you want I can provide both behaviors as Consumer
s and maybe mark one as stdout'
to differentiate them. In other words:
-- Ignores EPIPE
stdout :: (Monad m) => Consumer ByteString IO ()
-- stdout' = for cat (lift . B.putStr)
stdout' :: (Monad m) => Consumer ByteString IO r
I'm not as opposed to Consumer
s that trivially looped as before.
Yes, I know that you can use the for
loop anytime. But I'm concerned about encouraging users to use the gracefully-stopping approach by providing a readily available Consumer
that covers that case. However, your suggestion about providing both Consumer
s is quite appealing, since it forces users to think about the trade-offs between the two alternatives and willingly pick one of them. I think I like this :)
Alright, then I'll make sure that the ByteString
and Text
versions have both Consumer
s and make sure to document the tradeoffs of using either one so that they don't feel compelled to use the EPIPE
-ignoring version.
Awesome! Thanks for this discussion Gabriel. I was getting worried about this default.
No problem! :) I agree that having sensible defaults is important and that the exception-aware version is probably the best default for production code. I just wanted to have the EPIPE
-ignoring version in there somewhere because I have several programs in mind where I specifically want that behavior.
So I've been doing more thinking about this and I think that I will go with your original suggestion to treat Pipes.Prelude.stdout
and Pipes.Prelude.toHandle System.IO.stdout
differently, where Pipes.Prelude.stdout
will ignore EPIPE
and Pipes.Prelude.toHandle
will still propagate it as an exception. Would you be okay with using that convention for both the String
versions in Pipes.Prelude
and the ByteString
versions in Pipes.ByteString
?
I think it's good to treat them differently and just mention that difference in the documentation. So yes, I agree with this change.
What do you plan to do with the newlines? I assume you'll want to keep the current newline behavior in Pipes.Prelude.stdout
but not in Pipes.Prelude.toHandle
(which could potentially be used beyond simple GHCi examples). Now that you'll be making this change, perhaps renaming Pipes.Prelude.stdout
to Pipes.Prelude.putStrLn
or Pipes.Prelude.stdoutLn
doesn't sound that bad.
I was reviewing my past comments and I see that I made a mistake that I want clarify now. When you suggested the previous proposal of having both stdout
(ignoring EPIPE
) and stdout'
(not ignoring EPIPE
) and I said that it was a good decision, I was wrong. Having EPIPE
ignored when writing to standard output is a good thing, so stdout'
wouldn't make much sense. However, I kind of assumed that such stdout'
would be backed by a toHandle
that itself wouldn't ignore EPIPE
, which is just what you suggested in your last proposal, and with that I do agree. However, by now we both might have lost track of the details of this conversation, so in my next comment I'll just propose what I'd like to see in Pipes.Prelude
and we can continue the discussion from there.
Here are the String
I/O functions I'd ideally want to see in Pipes.Prelude
:
-- | Read 'String's from 'IO.stdin' using 'IO.getLine' from "System.IO".
--
-- Silently terminates on end of input
getLine :: Producer' String IO ()
-- | Read 'String's from a 'IO.Handle' using 'IO.hGetLine' from "System.IO".
--
-- Silently terminates on end of input
hGetLine :: IO.Handle -> Producer' String IO ()
-- | Write 'String's to /standard output/ using 'IO.putStr' from "System.IO".
--
-- Silently terminates on a broken output pipe.
putStr :: Consumer' String IO ()
-- | Write 'String's to /standard output/ using 'IO.putStrLn' from "System.IO".
--
-- Silently terminates on a broken output pipe.
putStrLn :: Consumer' String IO ()
-- | Write 'String's to a 'IO.Handle' using 'IO.hPutStr' from "System.IO".
hPutStr :: IO.Handle -> Consumer' String IO r
-- | Write 'String's to a 'IO.Handle' using 'IO.hPutStrLn' from "System.IO".
hPutStrLn :: IO.Handle -> Consumer' String IO r
I think I've correctly implemented them in the prop-71
branch, though I didn't try to execute them.
I'd probably also go ahead and add a hGetChunk
function that behaves similarly to Data.Text.IO.hGetChunk
, but I'm deliberately leaving that out for now since it's not straightforward to implement.
Don't worry. I understood what you meant. You wanted stdout
to ignore EPIPE
and toHandle
to not ignore EPIPE
and this would require that stdout
was not implemented in terms of toHandle
.
My plan is to have the Pipes.Prelude
versions still automatically remove and insert newlines like they do already, but the Pipes.ByteString
and Pipes.Text
versions will not (and this will be documented).
However, I still would like to keep the current names for Pipes.Prelude.{stdin,stdout}
. My opinion is that stdin
and stdout
should always represent the most appropriate default in the context of their surrounding library, and I think that in the context of Pipes.Prelude
they are the best default. The other reason I really like those names is that I want people to think of pipes as Handle
replacements, and naming them after the equivalent handles really drives that concept home.
Pipes.Prelude.{stdin,stdout}
are OK, I wasn't truly expecting you to like getLine
and putStrLn
=P
Anyway, what do you think about the other names like hPutStr
?
I want to keep Pipes.Prelude
simple. Also, the reason I don't want to include multiple variations that are line-based or line insensitive in Pipes.Prelude
is that it encourages downstream libraries to do the same, even though something like the FreeT
-based lines
and unlines
functions are the true solution.
Well, the toHandle
you just committed in Gabriel439/Haskell-Pipes-Library@0b6457e makes me happy enough. Thank you! :)
I'm not sure if this is really needed, but in the examples I've found around I see that the ioe_type
field in the IOError
exception is matched against ePIPE
as seen here: https://github.com/k0001/Haskell-Pipes-Library/commit/b3835a36dcfdeb247f6fcf2fdabb57f3152e348e#L0R175 I think it's a good idea to add that check so that we don't accidentally catch other exceptions we shouldn't be catching. I can send you a small pull request with that change if you want.
Yeah, I would like to change it to specifically match ePIPE
. Can you submit that pull request?
I'll close this now. I think it's settled. Thanks!
You're welcome!
I think there are some issues with the default behavior of
stdin
,stdout
,readHandle
andtoHandle
, from thePipes.Prelude
module. I'll mention them here so that we can discuss them.readHandle
andstdin
read an entire line at once, but the produced line doesn't keep the'\n'
byte. I don't think this is a default sensible behavior given the names of these functions, and the fact thatreadHandle
andstdio
fromPipes.ByteString
behave differently. Perhaps it should be better to renamePipes.Prelude.readHandle
toPipes.Prelude.hGetLine
andPipes.Prelude.stdin
toPipes.Prelude.getLine
, since they behave exactly as the equally named functions inSystem.IO
.If we perform such renaming, then perhaps we don't need to provide functions that behave similarly to
Pipes.ByteString.stdin
andPipes.ByteString.readHandle
, since those would be quite inefficient anyway, andPipes.Prelude.getLine
andPipes.Prelude.hGetLine
should be enough for simple purposes.In any case, even if I suggest renaming this functions, I would prefer if we just removed these two functions from
Pipes.Prelude
, sincelift
tingSystem.IO.getLine
andSystem.IO.hGetLine
asEffect
s should be more than enough.toHandle
andstdout
share similar problems to the ones mentioned above with regards to line endings. Perhaps renaming them tohPutStrLn
andputStrLn
, or just dropping them all together, is better. However, …There is a subtle difference between the expected behavior when writing to standard output and when writing to other
Handle
s. Currently,toHandle
swallowsIOError{ioe_type=ResourceVanished}
exceptions that might happen when trying to write to aHandle
, gracefully stopping theConsumer
by returning()
. However, I think such behavior is appropriate only when theHandle
is standard output. Consider a command-line program invocation like the following:foo | head -3
The
head
program will consume 3 lines of input fromfoo
's output and die, and aSIGPIPE
signal is sent to thefoo
program. The default behavior in this scenario would be forfoo
to terminate. However, if one decides to ignore theSIGPIPE
signal (which I think is the case for GHC, but I'm not sure), then the next timefoo
tries to write to standard output it will fail with an exception that matches the following pattern, meaning that we got an EPIPE error:I think it is perfectly fine to swallow such exception and gracefully end a
Consumer
that writes to standard output in that scenario, because:SIGPIPE
, so we are supposed to handle EPIPE errors ourselves.SIGPIPE
, which is to terminate the program).In any case, I think that the GHC RTS might have already solved this issue for us, since in some example code I wrote just now I couldn't manage to reproduce this scenario. I think this issue might be related: http://ghc.haskell.org/trac/ghc/ticket/2699
However, EPIPE errors might also happen when writing to other kinds of
Handle
s that don't behave like standard output. For example, when writing to theHandle
that underlies a TCP connection, you would receive the same kind of exception if the remote end closes its side of the connection, and it is OK to treat that scenario as extraordinary by letting the exception happen, instead of gracefully ending the consumer.So, in summary: I think
Pipes.Prelude.toHandle
should propagate all exceptions, andPipes.Prelude.stdout
should ignore the EPIPE errors. This would give different behaviors betweenPipes.Prelude.toHandle System.IO.stdout
andPipes.Prelude.stdout
, but I think that's OK and can be communicated in the documentation. However, I still think we should remove these functions altogether if possible, or at least rename them tohPutStrLn
andputStrLn
as I mentioned before.Notice all that I've mentioned here regarding EPIPE will also apply to the use of output
Handle
s inPipes.ByteString
.