Open subbyte opened 6 years ago
Hi @subbyte, I am happy that you found it useful and are using it successfully. I think I can add a state management section in the tutorial.
There is one more way to manage state and that is using an IORef. I think that may be easier for you. In case you are accumulating too much state this may be more efficient way because you can back the state by whatever kind of structure you want e.g. a hashmap or array or a compact region allocation that does not overload GC.
scanx
is pretty efficient too, what you see in the implementation is a continuation passing style (CPS) programming, there is no other stream inside. You will see similar code in all other streamly APIs. If you compare the performance of the scan operation with other streaming libraries (see https://github.com/composewell/streaming-benchmarks) streamly scan is the fastest excluding vector.
I am not sure if the order of StateT or SerialT actually matters for perf. You can measure it and see, you can just try using the GHC RTS options e.g. -s
or -S
, my guess it won't be significant. In general, I would advise that you worry about performance only if you see a noticeable slowdown or if the performance of your app is really critical, otherwise just write your program using whatever is the most convenient and readable way.
Thanks for the explanation! Before performance, I find I even need to understand the order of monad transformers better. Some articles say the order only affect the evaluation order, which I find not true. My early test shows that the transformer order affects the business logic (functionality of the program).
Today I wrote two test programs, one is SerialT (StateT s IO) ()
, the other is StateT s (SerialT IO) ()
. The first works well, while the second one does not work -- not keeping states through the processing the of the stream. When I evaluated StateT
in the second program, the state logic finished; the next item in the stream will start a new state procedure, which is not related to the previously evaluated state.
Am I right? I will try to understand the order better and do more tests.
You are right, the first one is the right order. SerialT
always has to be the top level Monad otherwise the monadic composition, for example, will not work as expected, assuming that the expected composition is the monadic composition of streamly. The order may not matter among the standard transformers (when using mtl) because each one implements the interface of the other. In this case SerialT
implements StateT
but vice-versa is not true.
For me it is still not so clear which alternative is the best to use in which scenario, but I have found that:
So, if you need to use some complex intermediate data structure it would be better to use State, right?
StateT introduces a global state threaded in the monad, whereas scan's state is confined within the combinator. You can use map
and filter
on the result of the scan to change the type or to filter out unwanted intermediate states/results.
You may also be interested in the liftInner
, evalStateT
etc. available in Streamly.Internal.Prelude
to lift the inner monad in parts of the stream to introduce a local state instead of using a global state by having a StateT throughout the stream.
As I understand StateT
threads pass the state recursively inside a certain scope, so the state is contained there (not 100% global), just like scan
. The difference would be how much bigger is the StateT
scope compared to the scan
one, right?
Then, using evalStateT
or runStateT
the StateT
scope could be reduced, but scan
scope is already the minimum. Is it my understanding correct?
Maybe some of this information could be added to the documentation to make things clearer.
We have not been able to focus on design level guides yet, we will surely add this along with documentation on state management in one of the guides.
Thanks for the beautiful library that is easy to learn! I am starting to learn composing streaming programs in Haskell, and you also have compared data-based and process-based streaming here, great!
When I need to accumulate information from the beginning of the stream, there are two ways of doing it as I know:
StateT
to form a monad stack (an example: AcidRain)scanx
directly provided bystreamly
(an example: my DNS monitor project)Is there any pros and cons between the two in terms of performance? I read the code of
scanx
and find another stream inside. I am wondering why not to implement it with aStateT
. I also get lost when trying to reason about performance of the monad stack. The first question I have is whether the orderStateT SerialT
orSerialT StateT
have any performance (CPU or memory) impact.