Behavior of stdin, stdout, readHandle and toHandle

k0001 commented 11 years ago

I think there are some issues with the default behavior of stdin, stdout, readHandle and toHandle, from the Pipes.Prelude module. I'll mention them here so that we can discuss them.

readHandle and stdin read an entire line at once, but the produced line doesn't keep the '\n' byte. I don't think this is a default sensible behavior given the names of these functions, and the fact that readHandle and stdio from Pipes.ByteString behave differently. Perhaps it should be better to rename Pipes.Prelude.readHandle to Pipes.Prelude.hGetLine and Pipes.Prelude.stdin to Pipes.Prelude.getLine, since they behave exactly as the equally named functions in System.IO.

If we perform such renaming, then perhaps we don't need to provide functions that behave similarly to Pipes.ByteString.stdin and Pipes.ByteString.readHandle, since those would be quite inefficient anyway, and Pipes.Prelude.getLine and Pipes.Prelude.hGetLine should be enough for simple purposes.

In any case, even if I suggest renaming this functions, I would prefer if we just removed these two functions from Pipes.Prelude, since liftting System.IO.getLine and System.IO.hGetLine as Effects should be more than enough.
toHandle and stdout share similar problems to the ones mentioned above with regards to line endings. Perhaps renaming them to hPutStrLn and putStrLn, or just dropping them all together, is better. However, …
There is a subtle difference between the expected behavior when writing to standard output and when writing to other Handles. Currently, toHandle swallows IOError{ioe_type=ResourceVanished} exceptions that might happen when trying to write to a Handle, gracefully stopping the Consumer by returning (). However, I think such behavior is appropriate only when the Handle is standard output. Consider a command-line program invocation like the following:

foo | head -3

The head program will consume 3 lines of input from foo's output and die, and a SIGPIPE signal is sent to the foo program. The default behavior in this scenario would be for foo to terminate. However, if one decides to ignore the SIGPIPE signal (which I think is the case for GHC, but I'm not sure), then the next time foo tries to write to standard output it will fail with an exception that matches the following pattern, meaning that we got an EPIPE error:
```
IOError { ioe_type = ResourceVanished
      , ioe_errno = Just ioe }
| Errno ioe == ePIPE -> ...
```
I think it is perfectly fine to swallow such exception and gracefully end a Consumer that writes to standard output in that scenario, because:
- If we are getting that exception it means that the whole program is ignoring SIGPIPE, so we are supposed to handle EPIPE errors ourselves.
- Since getting an EPIPE error when writing to standard output means that the “downstream” program has stopped consuming, and that we should terminate our program as soon as possible (as witnessed by the default behavior of SIGPIPE, which is to terminate the program).
- If we don't swallow such exception, the user executing such program invocation in the command line will see an extraneous “resource vanished” message printed in the screen.
In any case, I think that the GHC RTS might have already solved this issue for us, since in some example code I wrote just now I couldn't manage to reproduce this scenario. I think this issue might be related: http://ghc.haskell.org/trac/ghc/ticket/2699

However, EPIPE errors might also happen when writing to other kinds of Handles that don't behave like standard output. For example, when writing to the Handle that underlies a TCP connection, you would receive the same kind of exception if the remote end closes its side of the connection, and it is OK to treat that scenario as extraordinary by letting the exception happen, instead of gracefully ending the consumer.

So, in summary: I think Pipes.Prelude.toHandle should propagate all exceptions, and Pipes.Prelude.stdout should ignore the EPIPE errors. This would give different behaviors between Pipes.Prelude.toHandle System.IO.stdout and Pipes.Prelude.stdout, but I think that's OK and can be communicated in the documentation. However, I still think we should remove these functions altogether if possible, or at least rename them to hPutStrLn and putStrLn as I mentioned before.

Notice all that I've mentioned here regarding EPIPE will also apply to the use of output Handles in Pipes.ByteString.

k0001 commented 11 years ago

Actually, the GHC RTS was behaving as expected, but I was testing my program in the wrong way and that's why I couldn't reproduce what I was saying in my above comment.

The following example exits with status code 30 when a downstream program stops consuming input from standard output:

module Main (main) where

import qualified Control.Exception as Ex
import           Control.Monad     (forever)
import           Foreign.C.Error   (Errno(Errno), ePIPE)
import qualified GHC.IO.Exception  as G
import           System.Exit       (exitWith, ExitCode(ExitFailure))
import qualified System.Posix      as Posix

main = do
    -- If I uncomment the following line and a downstream program stops
    -- consuming from standard output, the program exits with status code 30.
    -- This is the default behavior in GHC.
    --
    Posix.installHandler Posix.sigPIPE Posix.Ignore Nothing

    -- If I uncomment the following line and a downstream program stops
    -- consuming from standard output, the program is terminated by SIGPIPE.
    --
    -- Posix.installHandler Posix.sigPIPE Posix.Default Nothing

    forever $ do
        Ex.catch (putStrLn "Hello")
                 (\e@(G.IOError{ G.ioe_type  = G.ResourceVanished
                               , G.ioe_errno = Just ioe }) -> do
                      if Errno ioe == ePIPE
                         then exitWith (ExitFailure 30)
                         else Ex.throwIO e)

What I propose is that only in this case we should gracefully exit a consumer writing to standard output. Otherwise, if we are writing to a different output Handle, we should let the exception propagate.

Gabriella439 commented 11 years ago

I would like to keep stdin and stdout because they make excellent teaching examples for what are non-trivial use cases for Producer and Consumer (i.e. inputs and outputs that can fail and terminate). The tutorial does not flow well without them.

Also, you can't replace stdin with lift getLine >~ p because getLine throws an unwanted exception on end of input.

Remember that stdin and stdout exist purely for demonstration purposes and will never be used in production code. I'd rather just include an advisory alongside the Pipes.ByteString code saying that the bytestring version of stdin and stdout preserves newlines, unlike the Pipes.Prelude versions. I'll also include a similar advisory in Pipes.Prelude.

The weird behavior you encountered is because GHC ignores EPIPE exceptions in its top-level exception handler. The exception will still show up if you install your own exception handler, though.

Even for production code I still think it's a good idea to gracefully terminate on EPIPE. It makes for a good default behavior (and GHC's top-level exception handler agrees) and if people want to diagnose and handle it, then it's as simple as instrumenting the return values to indicate which pipe failed, the same way that the tutorial does. If you need to include descriptive information about the nature of the failure, just include it in the return value.

Just because exceptions are the current state of the art for handling stream failures doesn't mean that they are necessarily the best way. I think that in the context of pipes the return value is a more appropriate way to indicate and handle stream failures.

Also, I think special-casing stdout is a recipe for quirks.

k0001 commented 11 years ago

I would like to keep stdin and stdout because they make excellent teaching examples for what are non-trivial use cases for Producer and Consumer (i.e. inputs and outputs that can fail and terminate). The tutorial does not flow well without them.

Also, you can't replace stdin with lift getLine >~ p because getLine throws an unwanted exception on end of input.

I agree that these are useful for demonstration purposes, it's OK to keep these or similar around.

Remember that stdin and stdout exist purely for demonstration purposes and will never be used in production code. I'd rather just include an advisory alongside the Pipes.ByteString code saying that the bytestring version of stdin and stdout preserves newlines, unlike the Pipes.Prelude versions. I'll also include a similar advisory in Pipes.Prelude.

Would you consider renaming them to stdinLn and stdoutLn?

The weird behavior you encountered is because GHC ignores EPIPE exceptions in its top-level exception handler. The exception will still show up if you install your own exception handler, though.

I didn't encounter a “weird” behavior; I encountered precisely the behavior I was expecting to get. At first I failed to execute my example correctly, but that's all.

Even for production code I still think it's a good idea to gracefully terminate on EPIPE. It makes for a good default behavior (and GHC's top-level exception handler agrees) and if people want to diagnose and handle it, then it's as simple as instrumenting the return values to indicate which pipe failed, the same way that the tutorial does. If you need to include descriptive information about the nature of the failure, just include it in the return value.

Just because exceptions are the current state of the art for handling stream failures doesn't mean that they are necessarily the best way. I think that in the context of pipes the return value is a more appropriate way to indicate and handle stream failures.

Also, I think special-casing stdout is a recipe for quirks.

The problem is that standard output is already special-cased by the operating system: getting EPIPE when writing to standard output doesn't mean the same than getting EPIPE when writing to another output Handle, and I don't think we should simply ignore that. The former is an expected scenario in which you can gracefully stop executing your program, but the latter means you had a serious connectivity issue with your output Handle.

And yes, it is a recipe for quirks, but it is the recipe the UNIX/POSIX guys left us :-(

Gabriella439 commented 11 years ago

I would still like to keep the stdin and stdout names just to keep Pipes.Prelude as appealing as possible. I'd prefer to just put in an advisory in the haddocks that these drop and add newlines, respectively.

Regarding EPIPE on Handles, think of it from the perspective of a user (like me) who prefers the termination behavior. If you don't provide a Consumer that terminates on EPIPE you force that user to write it themselves, and it is a non-trivial piece of code. In contrast, its very easy for anybody to get the behavior that treats EPIPE as an exception just by using for + the underlying write function.

Gabriella439 commented 11 years ago

So for now I will keep the current behavior of stdin and stdout to be line-based and I added a note to the documentation explaining how their behavior differs from the upcoming ByteString and Text utilities. Also, if people don't like this behavior, it incentivizes them to switch to using ByteString or Text.

I'm also keeping the EPIPE behavior because the behavior that ignores EPIPE is very easy to implement using for (and is already taught in the tutorial, too).

k0001 commented 11 years ago

I still don't think that the current behavior in case of EPIPE is right. It is OK in the case of writing to standard output, but if the Handle you pass to Pipes.Prelude.toHandle is, say, the Handle that underlies a network Socket, then just gracefully stopping is not OK. It's probably not super important in Pipes.Prelude because I don't expect these functions to be used much beyond simple examples; but I don't think the current approach should be reproduced in the upcoming pipes-bytestring and pipes-text.

k0001 commented 11 years ago

Oh, and the documentation bits about added newlines are quite welcomed! Thanks!

Gabriella439 commented 11 years ago

Right, but if you don't want to gracefully stop all you have to do is use a for loop instead:

for mySource (lift . sendAll socket)

However, if you want I can provide both behaviors as Consumers and maybe mark one as stdout' to differentiate them. In other words:

-- Ignores EPIPE
stdout :: (Monad m) => Consumer ByteString IO ()

-- stdout' = for cat (lift . B.putStr)
stdout' :: (Monad m) => Consumer ByteString IO r

I'm not as opposed to Consumers that trivially looped as before.

k0001 commented 11 years ago

Yes, I know that you can use the for loop anytime. But I'm concerned about encouraging users to use the gracefully-stopping approach by providing a readily available Consumer that covers that case. However, your suggestion about providing both Consumers is quite appealing, since it forces users to think about the trade-offs between the two alternatives and willingly pick one of them. I think I like this :)

Gabriella439 commented 11 years ago

Alright, then I'll make sure that the ByteString and Text versions have both Consumers and make sure to document the tradeoffs of using either one so that they don't feel compelled to use the EPIPE-ignoring version.

k0001 commented 11 years ago

Awesome! Thanks for this discussion Gabriel. I was getting worried about this default.

Gabriella439 commented 11 years ago

No problem! :) I agree that having sensible defaults is important and that the exception-aware version is probably the best default for production code. I just wanted to have the EPIPE-ignoring version in there somewhere because I have several programs in mind where I specifically want that behavior.

Gabriella439 commented 11 years ago

So I've been doing more thinking about this and I think that I will go with your original suggestion to treat Pipes.Prelude.stdout and Pipes.Prelude.toHandle System.IO.stdout differently, where Pipes.Prelude.stdout will ignore EPIPE and Pipes.Prelude.toHandle will still propagate it as an exception. Would you be okay with using that convention for both the String versions in Pipes.Prelude and the ByteString versions in Pipes.ByteString?

k0001 commented 11 years ago

I think it's good to treat them differently and just mention that difference in the documentation. So yes, I agree with this change.

What do you plan to do with the newlines? I assume you'll want to keep the current newline behavior in Pipes.Prelude.stdout but not in Pipes.Prelude.toHandle (which could potentially be used beyond simple GHCi examples). Now that you'll be making this change, perhaps renaming Pipes.Prelude.stdout to Pipes.Prelude.putStrLn or Pipes.Prelude.stdoutLn doesn't sound that bad.

k0001 commented 11 years ago

I was reviewing my past comments and I see that I made a mistake that I want clarify now. When you suggested the previous proposal of having both stdout (ignoring EPIPE) and stdout' (not ignoring EPIPE) and I said that it was a good decision, I was wrong. Having EPIPE ignored when writing to standard output is a good thing, so stdout' wouldn't make much sense. However, I kind of assumed that such stdout' would be backed by a toHandle that itself wouldn't ignore EPIPE, which is just what you suggested in your last proposal, and with that I do agree. However, by now we both might have lost track of the details of this conversation, so in my next comment I'll just propose what I'd like to see in Pipes.Prelude and we can continue the discussion from there.

k0001 commented 11 years ago

Here are the String I/O functions I'd ideally want to see in Pipes.Prelude:

-- | Read 'String's from 'IO.stdin' using 'IO.getLine' from "System.IO".
--
-- Silently terminates on end of input
getLine :: Producer' String IO ()

-- | Read 'String's from a 'IO.Handle' using 'IO.hGetLine' from "System.IO".
--
-- Silently terminates on end of input
hGetLine :: IO.Handle -> Producer' String IO ()

-- | Write 'String's to /standard output/ using 'IO.putStr' from "System.IO".
--
-- Silently terminates on a broken output pipe.
putStr :: Consumer' String IO ()

-- | Write 'String's to /standard output/ using 'IO.putStrLn' from "System.IO".
--
-- Silently terminates on a broken output pipe.
putStrLn :: Consumer' String IO ()

-- | Write 'String's to a 'IO.Handle' using 'IO.hPutStr' from "System.IO".
hPutStr :: IO.Handle -> Consumer' String IO r

-- | Write 'String's to a 'IO.Handle' using 'IO.hPutStrLn' from "System.IO".
hPutStrLn :: IO.Handle -> Consumer' String IO r

I think I've correctly implemented them in the prop-71 branch, though I didn't try to execute them.

I'd probably also go ahead and add a hGetChunk function that behaves similarly to Data.Text.IO.hGetChunk, but I'm deliberately leaving that out for now since it's not straightforward to implement.

Gabriella439 commented 11 years ago

Don't worry. I understood what you meant. You wanted stdout to ignore EPIPE and toHandle to not ignore EPIPE and this would require that stdout was not implemented in terms of toHandle.

My plan is to have the Pipes.Prelude versions still automatically remove and insert newlines like they do already, but the Pipes.ByteString and Pipes.Text versions will not (and this will be documented).

However, I still would like to keep the current names for Pipes.Prelude.{stdin,stdout}. My opinion is that stdin and stdout should always represent the most appropriate default in the context of their surrounding library, and I think that in the context of Pipes.Prelude they are the best default. The other reason I really like those names is that I want people to think of pipes as Handle replacements, and naming them after the equivalent handles really drives that concept home.

k0001 commented 11 years ago

Pipes.Prelude.{stdin,stdout} are OK, I wasn't truly expecting you to like getLine and putStrLn =P

Anyway, what do you think about the other names like hPutStr?

Gabriella439 commented 11 years ago

I want to keep Pipes.Prelude simple. Also, the reason I don't want to include multiple variations that are line-based or line insensitive in Pipes.Prelude is that it encourages downstream libraries to do the same, even though something like the FreeT-based lines and unlines functions are the true solution.

k0001 commented 11 years ago

Well, the toHandle you just committed in Gabriel439/Haskell-Pipes-Library@0b6457e makes me happy enough. Thank you! :)

I'm not sure if this is really needed, but in the examples I've found around I see that the ioe_type field in the IOError exception is matched against ePIPE as seen here: https://github.com/k0001/Haskell-Pipes-Library/commit/b3835a36dcfdeb247f6fcf2fdabb57f3152e348e#L0R175 I think it's a good idea to add that check so that we don't accidentally catch other exceptions we shouldn't be catching. I can send you a small pull request with that change if you want.

Gabriella439 commented 11 years ago

Yeah, I would like to change it to specifically match ePIPE. Can you submit that pull request?

k0001 commented 11 years ago

I'll close this now. I think it's settled. Thanks!

Gabriella439 commented 11 years ago

You're welcome!

Gabriella439 / pipes

Behavior of stdin, stdout, readHandle and toHandle #71