Fix lexer performance - Githubissues

athas commented 8 years ago

It turns out that the reason futhark-test has problems with large datasets (like the 50MiB kdd_cup.in file for kmeans) is that the lexer of all things consumes too much memory. There is one spot where we convert a Text to a String in order to call read that appears to be the major cost centre, but I'm not sure why. I've been looking for a space leak, but I can't find any...

athas commented 8 years ago

It may be that the solution is to write a simpler value parser using Attoparsec.

mrakgr commented 8 years ago

Hello, again.

It has been 2 months since I touched Futhark last and it has given me time to think what I want to do about it. Last time I ran into that huge space leak in ghc-mod when loading Futhark files, so it occurs to me that rewriting the parser is a good way to sidestep that issue. It would also familiarize myself with Furhark's internals a bit, and compared to last time, I do now understand Haskell well enough to give it a try in writing a parser in it.

Unless @nqpz has already done significant work on this, I'd volunteer to spend the next few weeks getting this done. Also I see you've done some work on shared memory tiling. It was faster than I expected. With that I'd have pretty good reason to put in some work getting Futhark to run with Fsharp and even look into making a Cuda backend, though I won't make any promises until the parser is done.

So @nqpz, how far did you get? Should take a look into doing it instead?

athas commented 8 years ago

I'll let @nqpz answer for himself, but you should be aware that the kind of parser we need is not the one you find in a Haskell tutorial. We do not need a new parser for the full language, but just a parser that can read (a subset of) the Futhark value syntax (no tuples). This parser will need to handle quite large inputs (dozens of MiB) and produce fairly large arrays. I think the most reasonable way to do this is to write an imperative parser using the ST monad and an STArray (which is then turned into a Futhark array at the end). I doubt any parser combinator library could deal with values of the size we need. It will look more like the parser we use in generated C code, which is easily fast enough: https://github.com/HIPERFIT/futhark/blob/master/rts/c/reader.h

I'm not saying it's necessarily hard, I'm just saying that it's not just "Haskell", it's "high-performance Haskell", which is a different style than you might have seen before.

We did however implement loop tiling, yes. It's still a fairly rushed implementation (paper deadline today!), probably brittle, and doesn't yet give as much speedup as we would like. It is, however, a start, and we have a decent idea of where to go from here.

\ Troels /\ Henriksen

mrakgr commented 8 years ago

Ok, I can do the above along the way. I do not think writing a fast parser for large arrays would be a particularly hard task in Haskell. I did take care to figure out how to do imperative programming in it.

I've only peeked at the source code of Futhark for a bit today, but I am more worried about monad transformers and those function that use higher kinded types. I never got a chance to get that far while I was doing HackerRank exercises, so I can only guess what those segments of the code are doing. Also it is my first time seeing FlexibleInstances and FlexibleContexts. It seems that I'll have to learn a lot to get through this challenge.

To start with, my plan is to see if I can separate the lexing and parsing stages in the code from anything else. That will let me verify if the memory leak really is there and will make it easier to test the new parser side by side. My goal with this rewrite would be to make it so the Futhark's code can be written while using ghc-mod which is really a programming necessity for me. I do love my IDEs.

Two months ago, you've mentioned that the REPL prints out the AST. Maybe I should start there?

Also are the lexing/parsing stages contained in the Languages folder? I am guessing they are. I'll let you know how it goes.

athas commented 8 years ago

Yes, lexing and parsing is contained in the Languages folder. It should not be hard to write a small test program that uses the parser, so you can isolate the inefficiency.

\ Troels /\ Henriksen

mrakgr commented 8 years ago

Well, maybe to you. :)

I've always found it much easier to write than to read programs. Right now I am looking at Attributes.hs and it does seem hard, and it would be like that I bet even without the help from the Atom's ghc-mod plugin (which I disabled to prevent it from crashing.) Well, I expected it like that.

When you build a program by hand there is a feeling that you have about the code that is simply missing when you read someone else's. And I am still not that familiar with Haskell's style...

I'll read all of the source files in the Languages folder from top to bottom, and ask you if I still can't figure out where the parser begins and ends if that is alright?

athas commented 8 years ago

Sure.

athas commented 8 years ago

Just to make everything clear. There are two parser issues:

The Happy-based parser (Parser.y) is translated into some abominable code by happy that causes ghci to choke and use huge amounts of memory when it is loaded. This is what you run into when you try to use ghc-mod. I don't know why this occurs - the Futhark grammar is not very complicated, but maybe everybody has this experience with Happy. Fixing this could involve writing a new parser with Parsec, or figuring out why the current parser behaves badly in ghci. Note that if you manage to get the parser loaded, ghci can run it just fine. That problem is not what this issue is about, although I can see why you would want it resolved.
The other issue is that some of our tools (futhark-test and futhark-bench) have a need to parse potentially very large Futhark values (not the full grammar, just the value syntax). A good example is the 72.2MiB kdd_cup.in from the kmeans benchmark. I don't think any parser written using Happy or any combinator library will be able to deal with something of that size, because merely creating a list of all the values (which I think they all require) would blow up. A better approach is a hand-written imperative parser that parses only the subset of the value syntax (no tuples or arrays of tuples) needed by futhark-test and futhark-bench.

There is one subtle detail when it comes to resolving the latter issue: the parser should produce core language values, not source language values. That is, these: https://github.com/HIPERFIT/futhark/blob/master/src/Futhark/Representation/AST/Syntax/Core.hs#L183 I don't think this is much of an issue, though. A good strategy is probably for the parser to work directly on an STArray, then convert it to a Futhark ArrayVal at the end.

mrakgr commented 8 years ago

A better approach is a hand-written imperative parser that parses only the subset of the value syntax (no tuples or arrays of tuples) needed by futhark-test and futhark-bench.

To be honest, just out of hand I am not really convinced that parser combinator libraries are a bad choice. Maybe if they use lists, but surely a custom parser could be written for a library that uses a resizable arrays? This is what I would do if I was doing this in F#. There is the Fparsec library for it based on Haskell's Parsec and it has a feature where custom parsers can be made by the users and there is actually an example how to make a parser which returns ResizeArrays in the documentation.

I am betting there is something similar for Haskell.

athas commented 8 years ago

You may be right. I am not aware of any such library, but I think it can be written. I'm not sure it's worth going that way unless such a library already exists, however. The parser we need is very simple and not likely to change much over time.

\ Troels /\ Henriksen

mrakgr commented 8 years ago

To clarify my position a bit more, even though you are a compiler author and a PhD (I think) in this field, my impression is that you aren't too familiar with parser libraries otherwise you would not have chosen Happy which in your own words produces bad code for the Futhark project. So I'll defer my decision until I can investigate Attoparsec and perhaps some other parser libraries and see what they offer.

As an aside, over a month ago while learning Haskell by doing HackerRank exercises, I did stumble on a problem that had to take Mbs long inputs which took way too long to process. As it turned out Haskell's native String = [Char] type turned out to be the culprit, more precisely it being the input.

...Though Behzad Nouri's example uses lists to store the results of parsing his example could be modified to use Vector's unfold to make the resulting structure faster and more memory efficient. I'll have to look around before deciding, but I am sure I'll find something.

athas commented 8 years ago

Oh sure, Happy is pretty bad. I mostly use it because, as a parser generator, it will tell me about ambiguities in the grammar, which is pretty handy while it is being developed. The code it produces is fine and efficient for compilation, but seems to cause some pretty bad behaviour in GHCi. You're right that parsing is not my specialty, though - I usually just use Parsec.

And yeah, lists (and String by extension) are performance poison for large data sets. That's one reason why I'm skeptical of parser combinators, because the one's I've used have been somewhat list-oriented (even if they supported a better string type than String).

I'd be interested if it turns out that it is possible to write performance and composable parsers for large inputs.

\ Troels /\ Henriksen

mrakgr commented 8 years ago

Ok, I have no idea. As you've said, when just the parser is loaded in GHCI it indeed takes up a lot of memory - 2.6Gb on my machine - but that is not related to the ghc-mod space leak. When type checking the example program, it stands steadily at 300-400Mb, which is standard for it.

Now that I see this, I've checked and it seems the Atom editor works just fine if it is in the Languages folder. The only exception seems to be the Parser.hs file where no type information shows up at all for some reason - but that bug too is not related to the space leak. The type information simply does not show up, I can see no signs of the memory being drained in the Task Manager.

I am thinking of what I should next... Now that I see there are differences depending which file is being looked at, what I will do is go file by file to see if I can narrow the leak further somehow.

mrakgr commented 8 years ago

Ok, I got it to crash 3 days ago once (Edit: Because I did not upgrade the plugins), but now I can only see it filling back and forth. Inside futhark-pyopencl.hs it goes up to 1Gb before falling down. Then the ghc-mod just churns away forever. But for files that are not at the bottom of the hierarchy, it actually does work without crashing – the time to respond being proportional how deep are they.

It is a definite improvement compared to two month ago, though the responsiveness even for the files in the Languages folder is below acceptable standards. It seems the ghc-mod authors did some work to keep a lid on the memory usage.

At any rate, I think understand why this is. I read a while ago on the /r/programming sub about the Swift's type checker choking on some really simple cases due to exponential edge cases in type inference. Futhark code probably has such surprises for the type checker inside it as well. If I was reporting a bug today, I'd just complain that that ghc-mod is taking too long.

This is the other side of a language having a very sophisticated type system.

You know, I think I'll just make that fast value syntax parser as you've suggested so I do not waste everyone's time and call that my contribution for the time being. I'll skip rewriting the whole thing now that I see it is not the source of the leak. And now that I am looking at Futhark's code more closely, I do think that making that Cuda and Fsharp backend does seem way beyond my ability. It would have been difficult even if I was an expert at Haskell. My pride will have to be content with this. My estimate of hold long it would take me to figure out how all this works is at >3 months now.

For the parser, I'll see whether my idea of making an existing parser combinator library has any merit. I'll do a a benchmark to establish the lower bound on speed and then see how close I can make Attoparsec or something else approach that limit.

Once that is done, we'll talk about how to integrate that parser.

I'll keep you posted.

mrakgr commented 8 years ago

I am throwing in the towel on this. I tried various things the most successful of which was to make a new many function that returns a pesistent_vector. I'll go from top to bottom over various things that I did starting from the previous.

1) After going through Attoparsec's source, I asked on SO regarding this issue to solicit various suggestions. At first I was not impressed with their quality as I was bent of figuring out how to do a dynamic array elegantly - I thought I could find a way as it is such a basic data structure, but it turns out there is a good reason I could not find an easy to download library anywhere.

The way to make persistent_vectors work with Attoparsec is simple. Here is the relevant many function from the Attoparsec's Data\Attoparsec\Internal\Types.hs (that manyTill, sepBy and other combinators make use of internally.)

    many v = many_v
        where many_v = some_v <|> pure []
              some_v = (:) <$> v <*> many_v

Furthermore, Attoparsec has a variant of many that evaluates strict in the first argument which I have found to work better than the standard function.

liftM2' :: (Monad m) => (a -> b -> c) -> m a -> m b -> m c
liftM2' f a b = do
  !x <- a
  y <- b
  return (f x y)

Here is how the many parser for VP looks like:

manyVP :: Parser a -> Parser (VP.Vector a)
manyVP v = many_v
  where
    many_v = some_v <|> pure VP.empty
    some_v = liftA2' (flip VP.snoc) v many_v

parseManyNumbers :: Parser (VP.Vector Int)
parseManyNumbers = manyVP (decimal <* skipSpace)

I've generated random 10M integers from 0 to 999 and found that I could parse them in 9.8s vs 1.2 it took F# using a vanilla ResizeArray. The program chocked badly on 100M integers compared to the F# one which scaled linearly.

For the context on persistent_vectors, the documentation states that they are similar to Scala and Clojure's vectors which are 32-ary tries if Martin Odersky's slide from the Scala Coursera course can be believed.

2) I was not really happy with the 10x gap in performance on 10M integer sum problem, so I did more research and tried making a many that returns a Vector. In fact, Attoparsec uses mutable arrays internally that it unsafely freezes and thaws. I tried the same approach, but it did not work out. Also it uses continuations for control flow which makes understanding how it works significantly more complicated.

Perhaps somebody more knowledgeable could fix the program linked above, but the infinite loop I am getting reminds me a little of when I tried using unboxed Vectors instead of lazy ones to do memoization and got infinite loops. That was easy to fix then, but I am not sure what to do here. In structure, the parser is similar to the first one.

3) For the next attempt I had a good idea of trying to fit a parser into the Vector's unfoldr function as to avoid having to resize it manually and writing a bunch of incredibly ugly C-like parsers in Haskell. The problem with that idea can be shown in the type signature for unfoldr:

unfoldr :: (b -> Maybe (a, b)) -> b -> Vector a

There is no way to get the state b from the unfold. Ugh! The solution to that is to use unsafePerformIO. Here is how it looks like:

takeWhile :: (Char -> Bool) -> State -> (Text,State)
takeWhile f state =
    let mut = unsafePerformIO (newIORef state)
        {-# NOINLINE mut #-}
        unfolder state =
          let (c,state') = readChar state in
            case f c of
              True ->
                unsafePerformIO $ do
                  writeIORef mut state'
                  return $ Just(c,state')
              False -> Nothing
    in
      (T.unfoldr unfolder state, unsafePerformIO (readIORef mut))

I do not understand why, but the program on the repo is slow. It literally takes 1.2s to parse just 10 numbers.

4) I tried other things such as removing unsafe operations from the above by returning the state inside the Vector:

takeWhile :: (Char -> Bool) -> State -> (Text,State)
takeWhile f state@(pos,text) =
    let unfolder state =
          let (c,state'@(pos',_)) = readChar state
          in
            case f c of
              True -> Just((c,pos'),state')
              False -> Nothing
        result = L.unfoldr unfolder state
        last_list =
          case map snd result of
            [] -> pos
            x -> last x
    in
      (T.pack $ map fst result, (last_list, text))

It made no difference to performance.

5) I decided to use my last resort and go with your original idea which is to use mutable arrays inside the ST monad. As I have learned recently, when Haskell is good at something, but when it is bad at something, it is really, really bad. Here is how the takeWhile function looks inside a ST monad:

takeWhile :: (Char -> Bool) -> State -> (Text,State)
takeWhile f state@(pos,text) = runST $ do
  vec <- VM.new 1
  vec_ref <- newSTRef vec
  (capacity_ref :: STRef s Int) <- newSTRef 1
  (size_ref :: STRef s Int) <- newSTRef 0
  pos_ref <- newSTRef pos
  char_state_ref <- newSTRef (readChar state)

  let cond = do
        char_state <- readSTRef char_state_ref
        return $ f (fst char_state)

  whileM_ cond $ do
    vec <- readSTRef vec_ref
    char_state <- readSTRef char_state_ref
    --traceShow (fst char_state) $
    modifySTRef' pos_ref (+1)
    size <- readSTRef size_ref
    capacity <- readSTRef capacity_ref
    if size < capacity then do
      VM.write vec size (fst char_state)
      modifySTRef' size_ref (+1)
    else do
      vec <- VM.grow vec capacity
      writeSTRef vec_ref vec
      modifySTRef' capacity_ref (*2)
      VM.write vec size (fst char_state)
      modifySTRef' size_ref (+1)
    writeSTRef char_state_ref (readChar $ snd char_state)
  size <- readSTRef size_ref
  pos <- readSTRef pos_ref
  vec <- readSTRef vec_ref
  vec <- V.unsafeFreeze vec
  let txt = T.pack $ V.toList $ V.generate size (\i -> vec V.! i)
  return (txt, (pos,text))

Looking at the above, the persistent_vector stops looking like a bad idea. And for some reason, like with the previous 2 attempts, the performance is horrible. It takes 1.2s to parse 10 numbers, 2.4s for 20 numbers and 3.6s for 30 numbers when loaded from the 10M integers file.

Profiling shows that it spends 97% of the time inside the readChar function.

type State = (Int, Text)

readChar :: State -> (Char, State)
readChar (pos, text) =
  if pos >= T.length text then
    error $ "Error at pos " ++ show pos
  else
    (text `T.index` pos,(pos+1,text))

Looking at the above nothing really jumps out at me at being wrong.

At this point, if I was doing the parser for myself, I'd just switch to using the immutable persistent_vector, use a different language for the compiler project, or write it in C++ with the Boost::Spirit parser combinator library and use it via a FFI.

I cannot believe that I am writing this, but it seems I really can't write a fast parser in Haskell. If this was in F#, I'd be done with it in an hour, but I've already worked on it for well over 20 in Haskell. I think it is best to leave this to the experts after all.

Originally, I intended to dedicate some of my time to Futhark, but apart from the stuff I did to get it to work on Windows, it does not seem I can contribute much apart from testing it when the Cuda and F# backends for it come out.

A few years down the road when I am done with my studies of reinforcement learning, I'll dedicate a large chunk of time in making my own language and I'll make Futhark as the base. I'll rebuild it in F# (or maybe F* which has been getting general programming features lately) and add automatic differentiation capabilities to it. For an autodiff language I think Futhark would be an ideal target since apart from bit twiddling operations, everything it has is differentiable.

That having said, if somebody wants to take a look at fixing some of my broken examples, I would not mind answering questions and putting in some extra effort to make this work.

Phew. I think I'll take a break.

athas commented 8 years ago

First, thank you for your efforts. It is motivating to see that someone can potentially find my work interesting! I think I will see whether I can tune the persistent_vector solution to perform acceptably - my guess is that there's some hidden laziness somewhere.

While I agree that Haskell is not the nicest imperative language, it's possible to do things a little nicer. Here is a reimplementation of your program using STUArrays: https://gist.github.com/Athas/d4e77e187328853fb56529d55dec8bd8

It still uses a bizarrely large amount of memory for large inputs, but I have a hard time seeing where.

\ Troels /\ Henriksen

mrakgr commented 8 years ago

Wow, it looks quite nice. Actually before today, I was not aware it was possible to get any sort of composability when something is inside a monad. I only figured it out by accident today when I did the cond function for the while loops inside my imperative solution.

Despite the ugliness of them, the main problem with all of my examples is how slow they are. I can't even imagine what must be going on under the hood for it to need 1.2s for 10 numbers.

athas commented 8 years ago

After asking around on #haskell, I got the hint that the vector package is where the action's at. I wrote this, which is more than an an order of magnitude faster than my previous attempt:

https://gist.github.com/Athas/bb38562801e8417ff0a1c5b4ad614a01

In fact, it's fast enough to be practical, and the code doesn't look all that bad, really. Kind of low-level, but that's a fair trade-off for predictable performance. (Remember to use -O3 when compiling.)

\ Troels /\ Henriksen

mrakgr commented 8 years ago

I'll definitely remember this style of coding in the future. I tried compiling with -O3 and the program ran in 3.4s compared to 6.4s before. It is still 6 orders of magnitude too slow, but I'll have to remember fast compilation in the future as well as I was not using it.

athas commented 8 years ago

My program (the last one that uses vectors) can process 10 million lines in 0.91s, and 100 million lines in 7.94s. That's fast enough for me.

henglein commented 8 years ago

There is no such thing as "fast enough". F

On Sun, Aug 7, 2016 at 9:50 PM, Troels Henriksen notifications@github.com wrote:

My program (the last one that uses vectors) can process 10 million lines in 0.91s, and 100 million lines in 7.94s. That's fast enough for me.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/HIPERFIT/futhark/issues/152#issuecomment-238104121, or mute the thread https://github.com/notifications/unsubscribe-auth/AAo8j2Brq-3XSSaoVjmGc8pPj_C6rHaeks5qdjcYgaJpZM4IPqQQ .

mrakgr commented 8 years ago

Any speculation on why my 3,4,5 examples were so slow? It is great that I now know how to use control flow in monads using functions - it seems stupid that I could not figure this out earlier - but despite the elegance of your code, functionally the only significant difference that I can see is it using the STVector instead of MVector. Is that it? It seems unlikely...

mrakgr commented 8 years ago

No, actually, now that I look more closely, the way you use Text is by dropping off processed items, while I am indexing into it. That has to be it, but why exactly...

index :: Text -> Int -> Char Source
O(n) Text index (subscript) operator, starting from 0.

Oh, it is O(n). Well that explains it. This is unbelievable.

athas commented 8 years ago

I think Text uses a variable-width encoding (maybe UTF-16) internally, so it has to be O(n).

\ Troels /\ Henriksen

mrakgr commented 8 years ago

Text's head and last are O(1) while appending to start and the end is O(n). Strangely enough length is also O(n). But if you think about it, in the above examples I was indexing into the first 30 elements out of 10M. So why did each of those indexings have to be an O(10000000) operation?

It doesn't matter. Actually, thanks to this I do feel I will never forget this lesson on monads that you gave me. I spent most of the day again learning about this and even considered giving writing a parser another try before realizing that I can't think of a particularly elegant ways to combine ST and State monads. I'll be interested in taking a look at how you (or @nqpz) will organize the code for the full parser.

At any rate, I think I finally understand why monads are so difficult - it is because they are continuations. This is something I studied for quite a bit, but I've never been able to generalize that knowledge into understanding regardless how much I thought about it.

athas commented 8 years ago

This is what I have right now: https://gist.github.com/Athas/bd0882077455247d3bb0fe5979b0555f

I'm not trying to be clever. Maybe someone can figure out a combinator library that can be used here, but I'm not really interested in writing parsers in the most elegant style - I just need something soon so I can improve our test coverage. The missing piece is properly handling the primitive values. I'll see if I can re-use the Futhark lexer itself. I don't fancy implementing matching for things like decimal literals by hand.

mrakgr commented 8 years ago

I figured it out. It all came to me last night.

Unfoldr can be generalized from (b -> Maybe (a, b)) -> b -> [a] to (b -> Either b (a, b)) -> b -> ([a],b). And if you take note, you will notice that b -> (a,b) is essentially the state part of the state monad. Once you have the above function you can make any of the parser combinators like many,sepBy and manyTill using the above function and write everything in a monadic style. Maybe such a function could be converted to continuation passing style so it works with Attoparsec, but I am not sure how yet.

I'll write up a demonstration.

mrakgr commented 8 years ago

I am nearly done with the function, but I do not know which language extension I need to turn on to make the following work. Any ideas?

unfoldrVec' :: (b -> Either b (a, b)) -> b -> (Vec.Vector a,b)
unfoldrVec' cond init_state = runST $ do
    empty <- MVec.new 1024
    (arr, state) <- runUnfoldr (0, empty) init_state
    arr <- freeze arr
    return (arr, state)
    where
      growIfFilled :: Int -> MVec.STVector s a -> ST s (MVec.STVector s a)
      growIfFilled i arr =
        if i >= capacity
        then MVec.grow arr capacity
        else return arr
        where capacity = MVec.length arr
      runUnfoldr :: (Int, MVec.STVector s a) -> b -> ST s (MVec.STVector s a, b)
      runUnfoldr (i, arr) state =
        case cond state of
          Right (x, state) -> do
            arr <- growIfFilled i arr
            MVec.write arr i x
            runUnfoldr (i+1, arr) state
          Left state ->
            return (MVec.slice 0 i arr, state)

The above gives me the following error:

Couldn't match expected type `b' with actual type `b1'
  `b1' is a rigid type variable bound by
       the type signature for
         runUnfoldr :: (Int, MVec.STVector s a1)
                       -> b1 -> ST s (MVec.STVector s a1, b1)
       at unfoldr_v6.hs:65:21
  `b' is a rigid type variable bound by
      the type signature for
        unfoldrVec' :: (b -> Either b (a, b)) -> b -> (Vec.Vector a, b)
      at unfoldr_v6.hs:52:16
Relevant bindings include
  state :: b1 (bound at unfoldr_v6.hs:66:27)
  runUnfoldr :: (Int, MVec.STVector s a1)
                -> b1 -> ST s (MVec.STVector s a1, b1)
    (bound at unfoldr_v6.hs:66:7)
  init_state :: b (bound at unfoldr_v6.hs:53:18)
  cond :: b -> Either b (a, b) (bound at unfoldr_v6.hs:53:13)
  unfoldrVec' :: (b -> Either b (a, b)) -> b -> (Vec.Vector a, b)
    (bound at unfoldr_v6.hs:53:1)
In the first argument of `cond', namely `state'
In the expression: cond state
E:\kdd_cup (erase this when done)\unfoldr_v6.hs: 70, 30
    Couldn't match expected type `a1' with actual type `a'
      `a' is a rigid type variable bound by
          the type signature for
            unfoldrVec' :: (b -> Either b (a, b)) -> b -> (Vec.Vector a, b)
          at unfoldr_v6.hs:52:16
      `a1' is a rigid type variable bound by
           the type signature for
             runUnfoldr :: (Int, MVec.STVector s a1)
                           -> b1 -> ST s (MVec.STVector s a1, b1)
           at unfoldr_v6.hs:65:21
    Relevant bindings include
      arr :: MVec.STVector s a1 (bound at unfoldr_v6.hs:69:13)
      x :: a (bound at unfoldr_v6.hs:68:18)
      arr :: MVec.STVector s a1 (bound at unfoldr_v6.hs:66:22)
      runUnfoldr :: (Int, MVec.STVector s a1)
                    -> b1 -> ST s (MVec.STVector s a1, b1)
        (bound at unfoldr_v6.hs:66:7)
      cond :: b -> Either b (a, b) (bound at unfoldr_v6.hs:53:13)
      unfoldrVec' :: (b -> Either b (a, b)) -> b -> (Vec.Vector a, b)
        (bound at unfoldr_v6.hs:53:1)
    In the third argument of `MVec.write', namely `x'
    In a stmt of a 'do' block: MVec.write arr i x

mrakgr commented 8 years ago

Nevermind, I just had to do it like this: unfoldrVec' :: forall a b. (b -> Either b (a, b)) -> b -> (Vec.Vector a,b)

athas commented 8 years ago

You need ScopedTypeVariables (otherwise the inner function type signatures each have a fresh 's'). I have no idea why this is not the default.

\ Troels /\ Henriksen

mrakgr commented 8 years ago

Done with the prototype for the value parser. It looks quite nice now. It finishes summing 10M integers in 2s on my machine. I consider this performance quite good.

Now that the main difficulty of figuring out the right design pattern is out of the way, the rest won't be hard. I threw in the towel on this two days ago, but let me take it back. I'll get this done.

athas commented 8 years ago

Sure. I've managed to hack up this thing: https://gist.github.com/Athas/5c2dd0fdb669eecce892adaa99958bd9

It is a module that defines an (opaque) value representation type, along with some comparison functions and (of course) the parser. It performs acceptably, but I'm not particularly pleased with how the parser is written - particularly the fact that it calls out to the default Alex-generated lexer for parsing constants.

\ Troels /\ Henriksen

mrakgr commented 8 years ago

I must admit you really work quickly. I'll see whether I can touch up what you have written.

By the way, does Futhark have special syntax to differentiate 8 vs 16 vs 32 vs 64-bit values like 1.2 and 1.2f for floats for example? I see that you have separate parsers, but I can't quite see how you are telling them apart.

Edit: Ah, nevermind, I see that you are calling the lexer. I'll just look there.

athas commented 8 years ago

Yes, we use a design taken from Rust: every constant can be suffixed with its desired type. 42i8 is an 8-bit integer, and 4.2f32 is a 32-bit float. If no suffix is provided, a plain integer is interpreted as 32-bit, and a plain decimal constant as a double-precision number (like in C).

\ Troels /\ Henriksen

athas commented 8 years ago

I pushed something that works for our immediate needs, but I'm not really happy with the way the parser itself is written. Room for improvement here. More generally, now I'm curious how one could design a parser combinator library tailored for producing very large values.

mrakgr commented 8 years ago

Three days ago while I was working a the first version of the library it hit me how use the continuation passing style (as well as monads and higher order functions), so I dropped what I was doing and went back to my old plan of trying to make a Vector parser for Attoparsec. This time, I was successful.

That having said, the performance is worse than the Either (a,b) unfoldr parser and it chokes on 100M integers, but so does the Either (a,b) unfoldr parser. It might be satisfactory however if the inputs are not extremely large.

I am not really happy with this to be honest. I cannot seem to find a sweet spot with Haskell in terms of control and all my successes as they are in this language feel partial due to that. This is the price of laziness it seems. Even after I figured out CPS, that simple function took an absurd amount of time to write due to misleading error messages and I had to resort to unsafe IO as I could not figure out how to do it inside the ST monad.

As disappointing as that is, unlike the time I did those HackerRank exercises, I feel that now I have a much fuller understanding of Haskell and the things I learned I will actually take advantage of when I start programming in F# again. The state monad in particular is the answer of how to get rid of the global state in my deep learning library and I never realized before that closures can be stacked naturally - I just used a stack of closures for the backwards pass before. Also, I never realized how beneficial functions having an uniform inputs and return values could be until I tried making that first parser and run into the problem of getting state out of unfold.

I tried adding bang patterns into the CPS parser, but that did not help much so if you have any ideas on how to make it run better be my guest. I'll take a break from this toil for a bit and then write a value parser with the CPS Vector Attoparsec parser after that since it seems like a more maintainable solution. I am not happy with it, but hopefully a straightline parse of 100M integers is just an extreme edge case for it. Maybe if the inputs have a treelike structure it does better?

mrakgr commented 8 years ago

No, actually forget that. I cannot bear doing such a shoddy job of this.

I have to ask at this point, just how open are you to making that parser in different languages such as F#? I mean you are already using C. If I did it in F#, it would take me half the time to do the entire thing that it did me to make the CPS parser in Haskell and there would be no inexplicable space leaks that so far I've run into everywhere, including Haskell's compiler services. The performance would also be on par with C and the thing would be easy to read and maintain. So how about it?

athas commented 8 years ago

First of all, it's good that these struggles are giving you new insights! That's the reason any of us work on this stuff, really.

Working with these huge homogenous structures (arrays) is definitely an edge case for both parsing and functional programming in general. Haskell is very well suited for complicated but relatively small structures (like, say, syntax trees or streams), but falls down when you just want simple data flow at very high throughput. It's the same when you try to do high performance array computing. Futhark is partially an attempt to show that these limitations are not inherent to functional programming as such, but merely to extant languages (or their implementations).

I'm not really open to adding more languages to Futhark, except as backends or ancillary tooling (we use some Python in the latter capacity, everything else is Haskell). Controlling the complexity on a program like this is already a tremendous struggle, and adding more languages would open a rat's nest of new issues.

\ Troels /\ Henriksen

mrakgr commented 8 years ago

Well, I guess that I'll stop my Haskell adventure here then. If I could have figured out a way to parse 100M numbers without blowing the heap I would have done this, believe me. Just now I did some more research on Haskell performance, but it boils down to things I already know and do in F# regularly - do not use lists, do not clutter the heap with closures, use value types, avoid laziness.

That last one is really killer. One thing I've realized in the past two weeks is that types are really poor at describing the control flow of a function, and programming is mostly about that. Adding another layer of abstraction makes it even harder. It is a huge mistake that laziness and and strictness are not a part of Haskell's type system.

Long ago I read online that reasoning about Haskell performance is really difficult, but it is not until you spend a week and a half trying to parse 100M integers and fail until you realize what that actually means.

The error messages are really poor as well - they are close to useless. In review, I really should have realized the reality of that and stopped programming the same as in F# where I would make a large chunk of code and rely on the type checker to guide me back to the correct path. There is a very real benefit to F# being a less abstract language than Haskell.

I pains me to say this, but thanks to this experience I will never use Haskell for any serious real world work. My image of it being an academic experiment is now pretty solidified.

I read online that contributors to open source projects create more work than they save and that is obviously correct here. The net effect as far as I can tell is that I just ended up writing a bunch of rants while making no real contribution, so I'll cut that short here. I'll get back to studying Fstar.

Thank you for your patience.

mrakgr commented 8 years ago

After giving up on doing this two times, going crazy more than that, spitting on the language and telling myself I would never use it again...seeing that a guy managing to get it to run on Reddit just by reading from standard IO, I decided to just try ByteString to test if Text was the reason why it is so inefficient and it turns out that it is. All this pain and suffering could have been avoided by ditching Text at the start. It should have occurred to me to do this as well given that I got a hint already when I found out it had O(n) indexing.

There is something in Haskell that brings me on tilt much more easily than in other languages. It seems I am going to have to be very careful of the data structures used in the underlying libraries if I ever use it again...which I won't.

I am tsundere for it at this point.

I've also tested switching the Attoparsec parser to ByteString.Char8, but that did not help it. Here laziness is in fact the culprit.

But for performance now that I know where the hole is, that leaves the door open to integrating it with some library that does not operate using continuation passing.

I'll do this at some later date just to bring this thing to a close...assuming you don't tell me something like that you are using Text everywhere and ByteStrings are out of the question. In that case it really is hopeless.

By the way, with Unboxed Vectors the thing goes through 100M integers quite quickly now, on par with the F# imperative solution. Here it is.

athas commented 8 years ago

Actually I have considered switching from Text to Bytestring for the value parser, to avoid paing the UTF-8 decoding premium. The value syntax itself does not contain any non-ASCII characters, so it wouldn't be much of a problem. The only issue would be comments containing crazy multibyte characters that would contain ASCII linebreaks, but I vaguely recall something about UTF-8 being designed so that it doesn't happen.

Anyway, the value parser is fast enough for my purposes for the moment (that is, I can run our benchmark suite). I would mostly be interested in making the code less ugly.

\ Troels /\ Henriksen

mrakgr commented 8 years ago

By my measure, the premium is quite a bit more than that. For illustration, when I boot into windows, I have only 1.5Gb occupied at the start. If I run the Text based parser on the 100M example then it goes all the way up to 8Gb (my total memory) and then aborts. Starting from 6.5Gb free If we substract 800Mb needed for the vector array, that would leave 5.7Gb for the rest of the program. Meaning the Text data structure uses it up in its entirety in loading the 430Mb file. Meaning Text uses more than 13 bytes per character. How absurd.

It should have occurred to me to suspect it when I realized it had O(n) indexing, but considering Text is supposed to be a high performance replacement for standard Strings I have to say this blindsided me as it undershot one of my fundamental assumptions about code behavior. I should not have blindly trusted it just because it is a part of the standard package. I am guessing you did not tell me to stop using Text because you had no idea yourself.

Well, if the value parser works for you then we might as well keep it at that. I am just glad to bring this mystery to a close.

mrakgr commented 8 years ago

I did some more research. I figured out how to add strictness annotations using seq (the compiler refused bang patterns for some reason) and added them to the Text unfoldr example. Now it does take like 6Gb and 38 seconds (compared to ~1.2Gb and 3s for Bytestring example,) which is a success even if a partial one.

Thinking about it at more depth, it occurred to me that there is no way Text could possibly take 13 bytes per character, so I've tried this. I've also tried reading just the last character and found that it uses the expected amount of memory to load the 430Mb file. Under the hood it probably it uses some tree like data structure, but the optimizations for it are not doing a good job of cleaning up after itself.

That is about it. Using Bytestring for any text related work is still the way to go.

athas commented 8 years ago

I agree. At some point the parser should be changed to use ByteString. Text is only worthwhile if you really want to do text processing.

mrakgr commented 8 years ago

Hello. I promise this will be the last post in this long closed issue page.

At first I decided to give making the value parser another shot in Haskell, but I could not gather the motivation and after considering doing it in Racket for a bit, I decided to just do it in F#. I understand this won't be helpful as a contribution to the Futhark project, but when I started doing it I got a few visions of the future.

I am not sure whether my Haskell experiences were responsible for triggering it, but for the past month or so, I was obsessed about polymorphism. I would go to bed and I would dream of interfaces. The thing I gained from that is that I now see how I could replace the backend of my ML library with an AST generator for Futhark. It would not be that difficult assuming I had such a generator.

And making such a generator would be much easier if I had something like a full F# Futhark parser, so it would make sense to start out that way. If I had a parser I would already have about half the job done and it would definitely make testing the generator easier. So with regard to Futhark, my plan is something like this:

1) Make a full parser. 2) Make a generator. 3) Make a backend for that generator.

I won't start it right away, in a few months more likely, but this value parser is a promising start. I thought it would take me only a few hours to make it, but adding real-time dimensionality checking to it made it quite a bit harder. Still it did only take me only today and yesterday so it was not too bad. I have not tested it on 100M arrays yet, but it uses resizable arrays internally, so I do not expect any space leaks or anything like that unlike in Haskell.

On that note, I'd like to apologize about my behavior in this issue page. I mean it. Before trying to make the parser in Haskell, I studied the language for over a month to make sure I would be ready just for this, but no matter how much I tried, I simply cannot get myself to like the language. I've tried quite a bit.

Quite frankly, programming in it puts me on tilt in no time at all, which is not something that happens to me in any other language and do not want to touch it again. This does not affect my willingness to study various functional programming methods originating from Haskell, or change my view towards Futhark, but I'll be happy if I never have to see another list for as long as I live. From now, I'll be content to look onto the language from afar.

With that, I am quite glad to see that you are already planning on making that F# backend. It seems that a few months ago you did not have shared memory optimization and you are already working on this. That is quite good progress! With shared memory and an F# backend, I'll really have no excuse not to make a Futhark backend for Spiral so keep it up.

I'll see you in a few months, hopefully.

athas commented 8 years ago

Look, I don't really mind your opinion on Haskell; there's no reason to apologise. I happen to find it convenient for my own use, but I don't feel any kind of responsibility for making sure it's liked by people on the Internet. And if I really liked Haskell that much, I probably wouldn't be trying to make my own language.

Why do you need a Futhark parser to generate Futhark ASTs? The Futhark syntax recently changed significantly to become more F#-like, but I don't think it will change again for the foreseeable future.

If you want to speed up development of an F# backend, you could post in that issue. I'm still in the early brainstorming phase, trying to figure out whether to do a C# backend instead (since interop is easy, right?), etc.

mrakgr commented 8 years ago

Why do you need a Futhark parser to generate Futhark ASTs?

It is definitely more work, but the individual steps would be smaller. It would be easier for me to proceed forward when I have a decent grasp of the preceding steps. Having a parser would allow me to make an isomorphism between the generator and the parser, which would make testing easier. Or at least those thoughts were going through my head when I had the idea of doing them both. My sense is telling me that it might be worth approaching it from that angle.

It would be different if I was you and knew everything there is to know about Futhark. In that case I'd just skip to the AST generator stage. We'll see...

In terms of interop, it is true that C#, F# and all the .NET languages have very good interop, but it is not at the scale of Scala and Java. By that I mean, that you can literally have Java and Scala files inside the same project and they would compile without a hitch.

With .NET, to use C# code in F# you would have to first compile the project into an assembly (.dll of .NET IL code) and then call the methods from that assembly. It works flawlessly and the Cuda library wrappers I am using in my ML library are written in C#, but there is an extra build step. Unless you have something pushing you towards making a C# backend, I suggest you make an F# one. It would make it easy for me to integrate Futhark produced files in my own library inside the same project.

F# also has like 3 open source OpenCL compilers written as Phd theses by various graduate students so that might be worth something there.

I took a look at the F# backend issue. Now, this is probably unrelated to questions 1 and 2, but since you are talking about memory allocation, just a few hours ago I found this via the /r/programming sub.

I am definitely in this for the long haul. In fact your progress on Futhark is actually much faster than I have expected, so I am still scrambling to set up the environment so as to actually make use of Futhark when it all gets here.

diku-dk / futhark

Fix lexer performance #152