commercialhaskell / rio

A standard library for Haskell
Other
835 stars 54 forks source link

logSticky is broken for non-ASCII messages #178

Closed mcschroeder closed 5 years ago

mcschroeder commented 5 years ago

Take the following example:

{-# LANGUAGE OverloadedStrings #-}
import RIO
import qualified RIO.Text

main = do
    runSimpleApp $ do
        logInfo "one"
        logSticky $ display $ RIO.Text.replicate 41 "ö"
        logInfo "two"
        logSticky "sticky"

At the end of the run, you would expect your terminal to look like this:

one
two
sticky

Yet what actually happens on an 80-width terminal is this:

one
two                                    sticky

If you change the umlaut ö in the logSticky to an o everything works as expected. Likewise, if you replicate the ö a number of times that is less than or equal to half the width of your terminal.

The reason for this weirdness are the functions to clear the sticky line used in stickyImpl in RIO.Prelude.Logger:

  let backSpaceChar = '\8'
      repeating = mconcat . replicate (B.length sticky) . char7
      clear = logSend lo
        (repeating backSpaceChar <>
        repeating ' ' <>
        repeating backSpaceChar)

The variable sticky here is a ByteString that holds the current sticky line. By using B.length to determine how many backspaces are needed to clear it, we overcount if there are non-ASCII characters in the string that are encoded with multiple bytes (like ö under UTF-8 which encodes to \195\182).

I'm not sure what the best fix is here:

  1. We could calculate the number of characters (as opposed to the number of bytes) using something like Data.ByteString.UTF8.length from the utf8-string package. This is a potentially expensive operation I'm guessing, but if we store the count together with the sticky line, we would only need to do it once when the line is logged (as opposed to every time we need to clear it). Also, the sticky line is going to be maximum as long as the terminal, which is probably <100 characters in most cases.

  2. We could clear the sticky line in some other way, e.g. using the functions from ansi-terminal. I don't know how portable those are, but this might be the cleanest solution.

  3. We could leave it as-is and add a warning to the documentation of logSticky to caution against using it with non-ASCII strings.

snoyberg commented 5 years ago

This is a good catch, thank you. Did your last name play into its discovery by chance? 😄

Clearing the sticky line in some other way does sound really nice, but I also don't know about portability. I'd rather not add a dependency on utf8-string, but hand-rolling a function that does the same thing should be fairly easy using bitmasking, but it would introduce some performance overhead.

Are you interested in taking a stab at implementing a solution to this?

mcschroeder commented 5 years ago

I was actually trying to add a fancy progress bar to my sticky log line, but when I discovered the problems I was having were due to character encoding I really decided to dig my teeth in... I guess I am particularly sensitive to issues of mishandling non-ASCII characters… 😉

I’ll take a stab at a solution, sure.