Open meooow25 opened 1 month ago
Thinking about it more, I realized that there is a more fundamental limitation than GHC optimations: the structure itself. A Set
cannot be folded over without requiring space proportional to the depth of the tree, which is $O(\log n)$. The above benchmarks only show that foldl'
does not allocate on the heap, but it still uses $O(\log n)$ stack space.
To reiterate, I want to draw attention to the facts that:
foldl
library cannot make accurate claims about the space required to apply a Fold
. It depends on the structure to be folded over, the implementation of foldr
/foldl'
for that structure, and the GHC optimizations at play. Even at their most efficient, the same fold will have different requirements: Fold.fold Fold.sum (xs :: [Int])
is $O(1)$ space and Fold.fold Fold.sum (xs :: Set Int)
is $O(\log n)$ space.Set
as demonstrated), the implementation of foldl'
is practically more efficient than obtaining a left fold via foldr
. At the same time, I cannot think of any situation where the opposite would be true. So I believe foldl'
would be a better delegate for Fold.fold
rather than foldr
.Yeah, it should probably clarify that it's only guaranteed to be constant space for lists.
Also, if I'm being really pedantic, it's only guaranteed to be constant space if all folds involved (including user-defined Fold
s) use an accumulator that is internally strict. This property holds for the existing Fold
s provided by the foldl
package but might not necessarily hold for user-defined Fold
s.
fold
andfoldM
are defined ashttps://github.com/Gabriella439/foldl/blob/473a5f002db29d78d3100930ce043158c5569a50/src/Control/Foldl.hs#L528-L544
But the description also claims that
Seems to be a little misleading to me. Whether the fold is in constant memory depends greatly on whether GHC is able to optimize
foldr
forf
into a constant-space fold, and it is not a guarantee arising out of this library. If GHC fails here, for whatever reason, it is likely to require linear memory.For
fold
, I think this could be avoided by usingfoldl'
. It is a safer bet that the implementor ofFoldable f
has written a constant-spacefoldl'
. Is there a reason whyfoldr
is used instead offoldl'
?For
foldM
, I don't know what can be done.And to demonstrate the problem, we can benchmark on
Data.Set
. Tree-like structures often definefoldr
in a way that that GHC cannot optimize.Benchmark source
```hs -- Using GHC 9.6.3, -O import qualified Control.Foldl as Fold import Data.Foldable (foldl') import Data.Set (Set) import qualified Data.Set as Set import Test.Tasty.Bench main :: IO () main = defaultMain [ env (pure xs_) $ \xs -> bgroup "sum 100000" [ bench "Control.Foldl.fold" $ nf (Fold.fold Fold.sum) xs , bench "fold but using foldl'" $ nf (fold' Fold.sum) xs ] ] where xs_ = Set.fromList [1..100000] :: Set Int fold' :: Foldable f => Fold.Fold a b -> f a -> b fold' (Fold.Fold step start done) xs = done (foldl' step start xs) {-# INLINE fold' #-} ```