Closed Gabriella439 closed 10 years ago
I think there're a lot of thing to be explored. So it's way too early to say that design is frozen. Here is most important in my opinion:
Foldable
wont work because some data types are monomorphic (text,bytestring) and some not polymorphic enough (unboxed vectors etc.)With regard to profunctors I'm not sure that instance is useful enough to warrant a dependency. And if we add monads to fold it becomes impossible to define comonad instances. We lose extend
but not duplicate
. So no comonad instance for FoldM
.
If we to add dependency on text
and bytestring
we should add vector
as well
The question of dependencies was somewhat impeding my own thinking what it might be useful to do. A lot seemed to turn on whether or how you were going to introduce the Fold/M
type into the pipes universe. It is clear you want to keep pipes
itself low on dependencies. The 'lenses' branch of pipes-parse
that was linked on the list imported folds
when I looked, but it has been taken out again, if I follow.
It is slightly annoying, but maybe one should concede that there will be a couple of packages; the trouble I had thinking about this in the past was how to divide them.
One of them might be something like foldl-instances
with fold functions for things like bytestring, text, vector, maybe Producer and FreeT (Producer ..), and various base types. instances
is the wrong word for those of course, but this might be a place for things like a profunctor instance, if that seems reasonable.
Another foldl-extras
-ish idea would be to include more interesting and perhaps dangerous Folds like the (pointlessly hyperspecialized) ones in the statistics package -- an optimized average, the diverse sorts of mean, histogram folds etc. It's plain Shimuuar has seen the connection (and has actual experience with the material.) Perhaps such a package would be the place for text/bytestring/vector fold
functions?
I do think that the pre-packaged folds are what makes this library distinctive and potentially widely used. The present library has a rather austere collection of folds but these are enough to impart the fundamental idea. One pedagogically important point is missing -- namely the obvious one that you apply the same Folds to diverse sequential types. It might make sense to add fold
functions for a few other base types, and maybe also the general Foldable
application, with remarks on its mixed merits? (I cant remember where it had trouble but it did)
I havent thought about it but there will obviously be fold functions for things like conduit
and so forth. But I guess micro-packages like foldl-conduit
or conduit-foldl
are what one does. So perhaps also foldl-pipes
? (Maybe one should brazenly add a pipes-dependency for foldl
since pipes
itself has so few and then add Producer and FreeT Producer folds?) Maybe I will make a couple such little packages to see what they look like. Note that there will need to be a specialized Char/Word8 fold functions for Text and ByteString producers.
Actually I consider foldl as possible foundation for calculation of mean/variance etc in statistics package. Current versions could only work with vectors which isn't satisfactory.
Yes, one of the things on my TODO list is to create numerically stable statistics like those found in the statistics
package
I've created experiment
branch in my fork to play with possible designs. It seems that rabbit hole is quite deep.
By the way, I'm not sure a comonad instance for FoldM
is out of the question, not that I'm sure what use it is. Is there an obvious objection to this type? :
data FoldM i m o = forall s . FoldM !s (s -> o) (s -> i -> m s)
It has an obvious extract
; I'm not sure the natural duplicate
or extend
follows all the laws. It seems to me that this type is somewhat recommended by a couple of simple analogies. If we start with the functor
data Request i o s = Request o (i -> s)
i.e., (o, i -> s)
, then we can redescribe Fold i o
as
data Fold i o = forall s . Fold s (s -> Request i o s)
which is the 'co-church' encoding corresponding to the 'normal' recursive type:
data Fold i o = Fold {runFold :: Request i o (Fold i o)}
How do we insert a monadic aspect into a standard recursive type like that? Most respectably, I guess:
data FoldM i m o = FoldM {runFoldM :: m (Request i o (FoldM i m o)} -- call this YYYYYYYYY
and less respectably:
data FoldM i m o = FoldM {runFoldM :: Request i o (m (FoldM i m o))} -- XXXXXXXXX
I am trying to arrange an obvious similarity between FoldM
, thus explained, and something like Server a b m Void
or Client a b m Void
(or maybe I mean forall r . Server a b m r
?)
Anyway, the non-recursive encodings corresponding to those two choices are
data FoldM i m o = forall s . FoldM s (s -> m (Request i o s))
and
data FoldM i m o = forall s . FoldM s (s -> Request i o (m s))
respectively. Working back now toward more familiar types, these in turn are equivalent to
data FoldM i m o = forall s . FoldM s (s -> m (o, i -> s)) --- WWWWWWW
and
data FoldM i m o = forall s . FoldM s (s -> (o, i -> m s))
respectively. The latter of these is then basically
data FoldM i m o = forall s . FoldM s (s -> o) (s -> i -> m s)
Reasons favoring something like WWWWW
over this last one will be like reasons for preferring the one I marked YYYYYY
over the one marked XXXXX
. It looks like there may be delicate points like those we find the current Proxy
definition, which avoids monadicizing everything as it used to and as it would if it were defined with FreeT
on some base functor.
Or anyway, I was wondering about this.
By the way, I'm not sure a comonad instance for FoldM is out of the question, not that I'm sure what use it is. Is there an obvious objection to this type? :
data FoldM i m o = forall s . FoldM !s (s -> o) (s -> i -> m s)
There is. It doesn't work nicely with stateful accumulators (e.g histograms). One need monadic action to convert mutable vector to immutable one.
On the other hand it's possible to have something comonad-like with extractM :: FoldM m a b → m b. I think it will obey comonadic laws (maybe those will need small adjustment)
@Shimuuar Perhaps a comonad in the category of Kleisli arrows over m
?
No idea. I'm not well versed in generalized abstract nonsence (-:
I see; I don't know how these histogram folds work. I take it the problem with a FoldM m a (m b)
is that the applicative combination with a FoldM m a c
is unpleasant? My main thought had to do with finding the 'right type' in this business, and making things easy for the compiler to optimize by de-monadicizing (?) as much as possible. Admittedly the s -> m o
function isn't particularly problematic. The analysis above occurred to me trying to figure out https://github.com/Gabriel439/Haskell-Foldl-Library/pull/3 ; one solution is a Skip-constructor-like solution. But such things are likely to complicate the types too much.
So, the comonad
instance is pretty low on my list of priorities. I can wait until somebody actually needs it.
@michaelt On second thought it would be more problem of style. Consider following code:
data FoldM i m o = forall s . FoldM !s (s -> o) (s -> i -> m s)
someFold = IO (FoldM Int IO (IO Int))
someFold = do
ref <- newIORef 0
return $ FoldM ref readIORef (\_ n -> modifyIORef ref (+n) >> return ref)
Here we're forced to wrap return type into monad. It's more of incovenience but I don't think comonad instance worth it.
I think this data type does form a comonad as you porposed. Problem arise with cokleisli arrows. They have type: FoldM a m b → b
but now we have only single definition which polymorphic in m
— extract
. Any attempt to fold something into FoldM
will yield m (FoldM a m b)
Yeah, I agree that the Comonad
instance is probably not worth the trouble for now.
I also wanted to mention that I just setup the pipes-foldl
library on Github. You can find it here:
https://github.com/Gabriel439/Haskell-Pipes-Foldl-Library
Here's some example code for how you would create a wc
-like program to count lines (using the lenses
branches of pipes-parse
and pipes-bytestring
):
import Control.Applicative (pure)
import Lens.Family ((^.))
import Control.Foldl (sum)
import Pipes.Foldl (fold, folds)
import Pipes.ByteString (stdin, lines)
import Prelude hiding (lines, sum)
main = do
numLines <- fold sum (folds (pure 1) (stdin ^. lines))
print numLines
Note that if you added a Num
instance for Fold
then you could replace the pure 1
with just 1
. Also, note that you can use Pipes.Prelude.sum
instead of fold sum
, but I just wanted to show off the use of the library.
@Shimuuar, my remark above wasn't really aimed at a comonad instance but at possible simplification, perhaps in aid of assimilating Fold
and FoldM
. It was ill-considered though, since I wasn't taking account of the use of FoldM
with mutable and ref types. Your argument that these are among the best uses of FoldM
is completely convincing.
@Gabriel439 I made a similar package and a few others, in order to think out how dependencies were supposed to work. Here are a couple of the others, for what they're worth https://github.com/michaelt/foldl-conduit https://github.com/michaelt/foldl-instances The second suggests a type class like the one Shimuuar envisaged somewhere in the discussion. The conduit one I had only started thinking about; it is just holding the place of something outside the system, for purposes of thinking about dependencies.
Well it's always possible to convert Fold
toFoldM
:
toFoldM :: Monad m ⇒ Fold a b → FoldM m a b
toFoldM (Fold f x0 out) = FoldM (\x a → return (f x a)) x0 (return . out)
It means that we can easily replace Fold
with FoldM
. The only question is whether we lose anything except possibly performance. I think no. Fold have no interesting structure. It's special case of FoldM
where monad doesn't do anything of interest. If we add monoidal structure to the accumulator (see #11 for details) and I'm not sure whether it's possible to generalize monoidal fold to use monads.
Recently I found rather nice way to encode any kind of data sample:
newtype Sample a = Sample (∀ r. Fold a r → r)
It encapsulates both data and folding function. It also form monoid so it's possible to concatentate any container whatever data representation it uses.
instance Monoid (Sample a) where
mempty = Sample extract
mappend (Sample f) (Sample g) = Sample (f =<= g) -- Cokleisli arrow
Now it's possible to define type class for arbitrary data sample:
class IsSample s where
type Elem s :: *
toSample :: s → Sample (Elem s)
So far I ignored monadic folds but it's quite simple to accomodate them too. Since we require code to work with any monad we can specialize to Identity and work with pure folds just as well.
newtype Sample a = Sample (∀ r m. Monad m => Fold m a r → r)
So I just added purely
and impurely
which mean we don't actually need pipes-foldl
any longer. The basic idea is that they take any function written to accept a step function, initial accumulator, and extraction function, and convert it to a function that accepts Fold
s or FoldM
s. This means that both pipes
and conduit
can add foldl
-compatible functions without incurring a foldl
dependency. As a bonus, it downgrades gracefully in the scenario where the user does not want to use foldl
since the un-upgraded functions are still usable on their own.
This also inadvertently solved another problem. The main reason I had two separate types was that I thought there was no convenient way to unpack a FoldM
as if it were a pure Fold
, but now if we merge the two types into a single type I can just rewrite the purely
function to automate all the Identity
wrapping and unwrapping so that it is transparent for the user.
I rememember @michaelt mentioned previously that he had profiled Fold
implemented in terms of FoldM
and said that it ran the same and did not leak space. Do you still have those tests? If it really is equally efficient then I'd be happy merging the two types at this point, and probably adding the following two type synonyms:
type Fold = FoldM Identity
type Fold' a b = forall m . (Monad m) => FoldM m a b
I'm writing a separate comment to address some points brought up by @Shimuuar in his first comment that I didn't have time to address until now.
Many fold accumulators admit efficient merge (element count, sum, max/min, mean etc.). For example if we have sums of two different samples we can get sum of their union by adding two sums. It's not clear for me how to integrate it to foldl library.
Can you elaborate on this? Wouldn't liftA2 (+)
be good enough for this purpose or did you have something else in mind?
Some algorithms require several passes over data. For example numerically stable algorithms for variance and higher central moments require estimate of mean as input.
Just have those folds take the mean estimate as the parameter and let the user decide how to derive that estimate (either by folding the data in a separate pass themselves or by obtaining the estimate through some other means).
Transformations of input stream. As soon as we start working with data which more complicated than bunch of numbers mapping and filtering becomes indispensable.
Alright. I still have an issue for this open because I will probably add this.
Common interface for data samples. There's wide range of different data types over which we want to fold. It would be quite nice to unify them under same API. And Foldable wont work because some data types are monomorphic (text,bytestring) and some not polymorphic enough (unboxed vectors etc.)
I've already addressed this by adding monomorphic folds for text
and bytestring
. I'd prefer to just add monomorphic type-specific folds rather than unify them under the same API. Even if I were to unify them under the same API the problem of unifying the Text
and ByteString
interfaces is outside the scope of the foldl
library.
With regard to profunctors I'm not sure that instance is useful enough to warrant a dependency. And if we add monads to fold it becomes impossible to define comonad instances. We lose extend but not duplicate. So no comonad instance for FoldM.
Right now the balance I'm trying to strike dependency-wise for foldl
is to only depend on packages in the Haskell platform, so that means that I will wait to add a profunctors
dependency until Edward gets profunctors
into the platform.
If we to add dependency on text and bytestring we should add vector as well
Well, what I think I will do is just upgrade fold
and foldM
to use a Foldable
constraint even if there is a slight performance penalty. It feels like the right thing to do in the long run and it's a backwards-compatible change. That would cover boxed vectors. Then I can add a foldVector
function for unboxed vectors. I don't mind the dependency on vector
at this point.
Regarding Sample
, I think it doesn't buy you anything that the Comonad
instance does not already give you, since the Comonad
operators are strictly more general. I would prefer to wait on comonad
to get into the Haskell platform rather than add a Sample
type.
Never mind, I will provide separate Foldable
functions because I just remembered how slow the Foldable
versions are.
On 12/29/2013 12:02 AM, Gabriel Gonzalez wrote:
Can you elaborate on this? Wouldn't |liftA2 (+)| be good enough for this purpose or did you have something else in mind?
Something else. If you counted number of elements in two lists: xs and ys then you can calculate nubmer of elements in xs++ys by summing two answers. Same goes for sum, maximum, etc. I hope it's clear enough
It's actually quite powerful thing. It allows to process different chunks of data independently whether to gain parallelism or because parts of data are of interest themselves. For example one performed N experiments and interested not only in aggregate statistics but whether experimets give compatible results. He could calculate answers for each experiment and then merge results without traversing data second time.
It however makes thing much more complicated.
Just have those folds take the mean estimate as the parameter and let the user decide how to derive that estimate (either by folding the data in a separate pass themselves or by obtaining the estimate through some other means).
If we're speaking about statistical library it's completely unacceptable from usability PoV. Such simple and basic operation must be simple one-liner.
On 12/29/2013 12:04 AM, Gabriel Gonzalez wrote:
Regarding |Sample|, I think it doesn't buy you anything that the |Comonad| instance does not already give you, since the |Comonad| operators are strictly more general. I would prefer to wait on |comonad| to get into the Haskell platform rather than add a |Sample| type.
I disagree it allows to write function which will work with different types of containers. It becomes posible to write function which calculate mean for list/vector/histogram
Monomorphic folds (for bytestring/text/etc.) allows to apply different folds to single type of data. This type class allows opposite thing take single fold and use with different types of data containers.
Something else. If you counted number of elements in two lists: xs and ys then you can calculate nubmer of elements in xs++ys by summing two answers. Same goes for sum, maximum, etc. I hope it's clear enough
I understand now. This is only possible if we change the internal implementation of Fold
back to the original formulation that I had in the first draft of this library:
data Fold a b = forall x . (Monoid x) => Fold (a -> x) (x -> b)
If the accumulator is a Monoid
then it's trivial to do what you are asking. However, there were three reasons that I went with the non-Monoid
version at the time:
Monoid
representation without leaking space.If we're speaking about statistical library it's completely unacceptable from usability PoV. Such simple and basic operation must be simple one-liner.
Can you give me an idea of what API you had in mind?
I disagree it allows to write function which will work with different types of containers. It becomes posible to write function which calculate mean for list/vector/histogram
What I was asking is what the purpose of the newtype was. If you supplied a Comonad
instance for Fold
then you could use the CoKleisli operators instead of the Monoid
operators.
Monomorphic folds (for bytestring/text/etc.) allows to apply different folds to single type of data. This type class allows opposite thing take single fold and use with different types of data containers.
I'm very reluctant to define new type classes unless there is a very compelling reason because type classes require a lot of buy-in from the rest of the Haskell ecosystem. The other problem is that you are asking for what is essentially the ListLike
type class and the solution to that problem is out of scope for the foldl
library.
On 12/29/2013 04:54 AM, Gabriel Gonzalez wrote:
I understand now. This is only possible if we change the internal implementation of |Fold| back to the original formulation that I had in the first draft of this library:
data Fold a b = forall x . (Monoid x) => Fold (a -> x) (x -> b) If the accumulator is a |Monoid| then it's trivial to do what you are asking. However, there were three reasons that I went with the non-|Monoid| version at the time:
- I couldn't figure out how to convert left folds (i.e. the current representation) to the |Monoid| representation without leaking space.
- The current representation has better constant factors
- You can't (easily) parametrize type class instances, and there were some folds that I wanted to parametrize by a weight argument. An example is an exponentially weighted moving average where you want to parametrize the moving average by the weight.
What about storing monoid dictionaries and current state explicitly:
data Fold a b = ∀x. Fold (a → x) (x→x→x,x) x (x → b)
Since we are storing current state there's no space leak and it's possible to parametrize explicit dictionaries.
If we're speaking about statistical library it's completely unacceptable from usability PoV. Such simple and basic operation must be simple one-liner.
Can you give me an idea of what API you had in mind?
Best idea I came up so far is something along the lines:
newtype MFold a b = MFold (Sample a → b)
It's possilbe to chain such folds. It's actually a reader monad.
chain :: MFold x a → (a → MFold x b) → MFold x b chain = (>>=)
Sample arose during experimentation with such N-pass folds. I tried to hide fold state and make fold non-resumable.
But most severe problem is not N-pass folds. It's entirely possible to use ad-hoc solutions. Problem is that single fold accumulator can be used to estimate several statistic at once. For example online variance algorithm could be used to estimate number of elements, mean, max. likelyhood/unbiased variance/std. deviation or any combination of them.
So one problem I just realized last night is that if you try to make folds parallelizable you can't make them effectful any longer since you have to preserve the ordering of effects.
I also found it difficult to formulate meaningful law for effectful parallelizable folds. There's another concern. I think it's not reasonable to require user to define merge function if all he want some aad hoc fold. I think only way is to create several fold types.
Well, I would like to keep foldl
as simple as possible, though, mainly because Edward already covered the opposite extreme in his folds
library. I wanted something that is very approachable for beginners who don't know how to write complex folds that are efficient and don't leak space.
Can't foldl
be built on top of folds
to provide a simpler API ? Is there any hope to see a single ´fold´ lib ?
To end users, the duplication of libs with common goals as exemplified by mtl
vs transformers
is quite confusing.
At least at first and second glance ;-)
That said, folds
does not look like an easy one to digest.
So the first problem is that folds
has a ton of dependencies and I would like foldl
to get into the Haskell platform eventually.
However, the nice thing about foldl
is that it has two functions which greatly encourage dependency-free interop with other libraries. Check out the latest addition of purely
and impurely
, which allows other libraries to reuse foldl
without actually depending on it. I use this to mix foldl
with pipes
libraries without incurring a foldl
dependency.
The other thing is that foldl
is designed to be as conceptually simple as possible because it is supposed to be a teaching tool. folds
is very beginner-unfriendly, and building foldl
on top of folds
basically makes its implementation inscrutable and useless as a teaching tool.
I do agree about the teaching aspect. Reading your source is quite approchable in general.
Are you planning to add a profunctor
or even a comonad
instance as suggested by Tom and Edward ?
So my plan is to fake profunctor
for now by adding mapped
and then add it once it gets into the platform. Same thing for comonad
, I will wait for it to get into the platform.
I will go ahead and close this issue. What I concluded from this discussion is that:
foldl
can safely depend on the Haskell Platformfoldl
will also provide premap
until profunctors
can get into the Haskell Platform,pipes
any longer now that we have the purely
and impurely
combinatorsAnything more complicated than that should probably go into a derived library. The exceptions are functions like filtered
and other input transformations. I'm still open to the idea of including them but I need to spend more time reviewing this.
This is partially an update and partially a solicitation for opinions on certain design choices for this library.
So I've designed
pipes
libraries to not directlyl depend onfoldl
so that I can be much freer about adding dependencies tofoldl
. The reason why is that these other issues have indicated thatfoldl
may require many other dependencies including, but not limited to:profunctors
comonad
text
andbytestring
(to provide convenience folds for these types)I wanted to see if people would be fine with the above dependencies or if they had other dependencies that they thought might be worth adding. I'd like to develop this into a more fully featured library.
Also, I've decided to freeze the core type (except perhaps to unify
FoldM
andFold
into a single type as @michaelt suggested), so there won't be any new changes on that front. I'm pretty happy with its current functionality.