fpco / say

Send textual messages to a Handle in a thread-friendly way
MIT License
29 stars 1 forks source link

Occasional jumbled text when using sayString with threaded runtime and -O #3

Open soupi opened 3 years ago

soupi commented 3 years ago

This uses LTS-17.10, so ghc-8.10.4, say-0.1.0.1 and time-1.9.3.

I'm trying to use sayString to write 1000 values to stdout concurrently and I occasionally get jumbled text.

Note that I couldn't reproduce this issue without -O, but it does reproduces without -N.

Here's a short program to reproduce:

import Say (sayString)
import Control.Monad (replicateM_)
import Data.Time.Clock (getCurrentTime)
import Control.Concurrent (forkIO, threadDelay)

main = do
  replicateM_ 1000 (forkIO $ getCurrentTime >>= sayString . show)
  threadDelay 1000000

Compile with:

stack ghc --package time --package say -- -threaded -with-rtsopts=-N -O time-say.hs

Most of the values here are displayed correctly, but a few are not, for example:

2021-06-07 07:03:46.818345816 UTC
2021-06-07 07:03:46.818353411 UTC
2021-06-07 07:03:46.818342099 UTC
2021-06-07 07:03:46.818341568 UTC
2021-06-07 07:03:46.818311742 UTC
2021-06-07 07:03:46.818339184 UTC
2021-06-07 07:03:46.818342872021-06-07 07:03:46.818359242 UTC
2021-06-07 07:03:46.818349734 UTC
2021-06-07 07:03:46.81835831 UTC
2021-06-07 07:03:46.818357999 UTC
...
2021-06-07 07:03:46.81814 UTC
2021-06-07 07:03:46.818147594 UTC
2021-06-07 07:03:46.814533975 UTC
2021-06-07 02021-06-07 07:03:46.818129731 UTC
2021-06-07 07:03:46.81813476 UTC
2021-06-07 07:03:46.818151682 UTC
2021-06-07 07:03:46.814553481 UTC
...
2021-06-07 07:03:46.818434954 UTC
2021-06-07 07:03:46.818417611 UTC
2021-06-07 07:03:46.818442608 UTC
7:03:46.81813435 UTC
1 UTC

I've wrote an equivalent program that uses TQueue instead in which all values are displayed correctly:

import Control.Monad (replicateM_, forever)
import Data.Time.Clock (getCurrentTime)
import Control.Concurrent (forkIO, threadDelay)
import Control.Concurrent.STM (newTQueueIO, atomically, readTQueue, writeTQueue)

main = do
  queue <- newTQueueIO
  forkIO $ forever $ print =<< atomically (readTQueue queue)
  replicateM_ 1000 (forkIO $ getCurrentTime >>= atomically . writeTQueue queue)
  threadDelay 1000000

Compile with:

stack ghc --package time --package stm -- -threaded -with-rtsopts=-N -O time-stm.hs
snoyberg commented 3 years ago

I think the approach we take in unliftio's logging is probably superior. I wonder if it's even worth keeping say around, or instead pointing people to unliftio instead. Would you mind checking with unliftio and logInfo/runSimpleApp if you reproduce the buggy behavior?

soupi commented 3 years ago

I've never used unliftio or rio before so this port might not be correct:

import RIO
import UnliftIO.Concurrent
import Control.Monad (replicateM_)
import Data.Time.Clock (getCurrentTime)

main :: IO ()
main = runSimpleApp $ do
  replicateM_ 100000 (forkIO $ liftIO getCurrentTime >>= logInfo . fromString . show)
  threadDelay 1000000

And compiled with:

stack ghc --package time --package rio --package unliftio -- -threaded -with-rtsopts=-N -O time-rio.hs

I found 3 issues with this code compared to the stm version

  1. There were still a few instances of jumbled text.
  2. It was slower than the stm version (also with 100000 lines). On my machine the stm version printed all 100000 lines in the allotted time and the rio version only managed to print around 85000, though say was still much worse than both with about 30000 lines.
  3. The memory usage of the rio version was significantly higher than the stm version:

stm:

   1,136,781,928 bytes allocated in the heap
     158,843,744 bytes copied during GC
       7,951,888 bytes maximum residency (8 sample(s))
         518,640 bytes maximum slop
              34 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       974 colls,   974 par    1.069s   0.125s     0.0001s    0.0060s
  Gen  1         8 colls,     7 par    0.408s   0.038s     0.0047s    0.0224s

  Parallel GC work balance: 18.11% (serial 0%, perfect 100%)

  TASKS: 37 (1 bound, 36 peak workers (36 total), using -N12)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.003s  (  0.001s elapsed)
  MUT     time    1.618s  (  0.990s elapsed)
  GC      time    1.476s  (  0.162s elapsed)
  EXIT    time    0.002s  (  0.006s elapsed)
  Total   time    3.100s  (  1.160s elapsed)

  Alloc rate    702,695,389 bytes per MUT second

  Productivity  52.2% of total user, 85.3% of total elapsed

rio:

   4,373,745,016 bytes allocated in the heap
   1,076,331,800 bytes copied during GC
   3,281,038,616 bytes maximum residency (11 sample(s))
      13,178,864 bytes maximum slop
            3294 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      3135 colls,  3135 par    4.046s   0.442s     0.0001s    0.0181s
  Gen  1        11 colls,    10 par    6.508s   0.696s     0.0633s    0.1129s

  Parallel GC work balance: 77.41% (serial 0%, perfect 100%)

  TASKS: 103 (1 bound, 37 peak workers (102 total), using -N12)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.003s  (  0.001s elapsed)
  MUT     time    2.801s  (  0.606s elapsed)
  GC      time   10.554s  (  1.138s elapsed)
  EXIT    time    2.744s  (  1.545s elapsed)

Please let me know if I made a mistake somewhere.

snoyberg commented 3 years ago

I'm less concerned about the performance comments. Both say and unliftio are optimized for working with non-String data, so some overhead for converting from a Show instance is IMO acceptable. If the same kind of performance degradation or memory usage problems popped up using a Builder directly in unliftio I'd be worried.

I'm more concerned about the jumbled text, that really shouldn't be happening unless there are massive lines of text present. Can you give me an example of how it's getting jumbled?

soupi commented 3 years ago

Sure. Most lines look fine. For example:

2021-06-09 06:07:46.812478073 UTC
2021-06-09 06:07:46.812478784 UTC
2021-06-09 06:07:46.812478363 UTC
2021-06-09 06:07:46.812479065 UTC
2021-06-09 06:07:46.812483272 UTC
2021-06-09 06:07:46.812479355 UTC
2021-06-09 06:07:46.812482832 UTC
2021-06-09 06:07:46.812473123 UTC
2021-06-09 06:07:46.81248201 UTC
2021-06-09 06:07:46.812480798 UTC
2021-06-09 06:07:46.812462353 UTC
...

But then line around line 2170 (in this particular run) one line looks off:

2021-06-09 06:07:46.850262935 UTC
2021-06-09 06:07:46.850264338 UTC
2021-06-09 06:07:46.850263536 UTC
2021-06-09 06:02021-06-09 06:07:46.812254854 UTC
2021-06-09 06:07:46.812253431 UTC
2021-06-09 06:07:46.812193348 UTC

and another around 2410:

2021-06-09 06:07:46.850457761 UTC
2021-06-09 06:07:46.850456348 UTC
2021-06-09 06:07:46.850461137 UTC
2021-06-09 06:07:46.852021-06-09 06:07:46.812349111 UTC
2021-06-09 06:07:46.850522061 UTC
2021-06-09 06:07:46.850523484 UTC

And around 2650:

2021-06-09 06:07:46.851392665 UTC
2021-06-09 06:07:46.851391452 UTC
2021-06-09 06:07:46.851398275 UTC
2021-06-09 06:07:46.851392021-06-09 06:07:46.812211052 UTC
2021-06-09 06:07:46.851460492 UTC
2021-06-09 06:07:46.851461834 UTC

2890~

2021-06-09 06:07:46.851613329 UTC
2021-06-09 06:07:46.851616755 UTC
2021-06-09 06:07:46.851615793 UTC
2021-02021-06-09 06:07:46.812152372 UTC
2021-06-09 06:07:46.851644818 UTC
2021-06-09 06:07:46.851645559 UTC

and this goes on... around the end of the file (line 82400~) there are what I believe are the other halfs of the lines which were previously cut off:

2021-06-09 06:07:46.885527509 UTC
9 06:07:46.885687599 UTC
2021-06-09 06:07:46.885687679 UTC
2021-06-09 06:07:46.885688261 UTC
2021-06-09 06:07:46.885688711 UTC
2021-06-09 06:07:46.885689142 UTC
2021-06-09 06:07:46.88568771 UTC
 06:07:46.885809288 UTC
2021-06-09 06:07:46.885809839 UTC
2021-06-09 06:07:46.885809698 UTC
2021-06-09 06:07:46.88581034 UTC
2021-06-09 06:07:46.885807114 UTC
2021-06-09 06:07:46.885811131 UTC
2021-06-09 06:07:46.88581051 UTC
2021-06-09 06:07:46.885811281 UTC
2021-06-09 06:07:46.885811372 UTC
2021-06-09 06:07:46.885812724 UTC
2021-06-09 06:07:46.885814227 UTC
2021-06-09 06:07:46.885815078 UTC
2021-06-09 06:07:46.885815068 UTC
2021-06-09 06:07:46.88581592 UTC
5957275 UTC
2021-06-09 06:07:46.885957506 UTC
2021-06-09 06:07:46.885958107 UTC
2021-06-09 06:07:46.885958127 UTC
2021-06-09 06:07:46.885958077 UTC
2021-06-09 06:07:46.885958838 UTC
2021-06-09 06:07:46.885958978 UTC
19375 UTC
2021-06-09 06:07:46.917419475 UTC

I hope this helps.

snoyberg commented 3 years ago

Thanks it does. I won’t have a chance to look into this for at least the next few days, but I hope to create a reproducing test case next. It’s encouraging that you seem to be reporting good performance via STM. It leaves that as a potentially good solution if some of my underlying assumptions are wrong here.

soupi commented 3 years ago

btw this is not urgent to me at all, I reported it because I ran into it :)

snoyberg commented 3 years ago

Ah thanks for clarifying. I’ll probably hold off a bit longer then. Appreciate the report.