dmwit / universe

Classes for types where we know all the values
BSD 3-Clause "New" or "Revised" License
37 stars 17 forks source link

Prevent sharing #23

Open phadej opened 8 years ago

phadej commented 8 years ago

For some usage patterns it's beneficial to produce universe stream on the fly, without memoising it. As it's a CAF it's get memoised by default:

For example with:

consumerExample :: Property
consumerExample = once $ a .&&. b
  where
    a = (universe !! 10000001) === (5000001 :: Integer)
    b = (universe !! 10000001) === (5000001 :: Integer)

maximal residency get out of hands:

     272,054,848 bytes maximum residency (13 sample(s))

One solution would be to change Universe class to be

class Universe a where
    {-# DEPRECATED universe "Consider using churchUniverse" #-}
    universe = build churchEncoding

    churchUniverse :: forall b. (a -> b -> b) -> b -> b
    churchUniverse = churchUiverseDef

{-# DEPRECATED universeDef "Consider using churchUniverseDef" #-}
universeDef :: (Enum a, Bounded a) => [a]
universeDef = [minBound..maxBound]

churchUniverseDef :: (Enum a, Bounded a) => forall b. (a -> b -> b) -> b -> b
churchUniverseDef c n = foldr c n [minBound..maxBound]

I could try to make this change, if it's a problem you think is worth solving. My gut feeling is that using church encoding is probably the best way to prevent sharing.

phadej commented 8 years ago

With those changes and Universe Integer defined as

instance Universe Integer  where
  churchUniverse c n = foldr c n (0 : d [1..])
    where d (x:xs) = x : negate x : d xs

The residency size drops to

63,880 bytes maximum residency (5 sample(s))
dmwit commented 8 years ago

Your Integer instance is a bit funny; e.g. this one seems more natural if we're church encoding:

instance Universe Integer where
    church c _ = 0 `c` go 1 where go v = v `c` negate v `c` go (v+1)

But if we're going to change the interface, we might consider a strongly improved one rather than an incremental improvement. I've been pondering for a while whether instances should expose the "tree-like" structure available from (+++) and friends. This would give users significantly more control over enumeration ordering and starting point -- and we could provide a standard one that made appropriate calls to interleave, diagonal, or choices. Here's a half-baked initial example of how that might look:

data Size = Finite | Infinite
class Universe a where
    root :: Maybe a
    parent :: a -> Maybe a
    children :: a -> (Size, [a])

instance Universe Integer where
    root = Just 0
    parent n = case compare n 0 of
        LT -> Just (n+1)
        EQ -> Nothing
        GT -> Just (n-1)
    children n = (Finite, [n-1 | n <= 0] ++ [n+1 | n >= 0])
phadej commented 8 years ago

It is funny. I tried yours version, but it allocates go closures as churchUniverse is evaluated further. So it's GHC specific in a sense.

I'm not sure whether parent/children approach is practical. It feels it will be tricky to write definitions for non trivial data. I'd rather church encode the tree then, so users could traverse it in different order but it's not a memoisable data-structure CAF. I.e universeC :: (a -> [b] -> b) -> b -> b or something like that. Yet then we lose deforestation stuff from/in GHC (maybe we will anyway as the tree-like stuff exists there anyway).

phadej commented 8 years ago

Yet your proposal resembles http://hackage.haskell.org/package/countable-0.2/docs/Data-Countable.html definitions

phadej commented 8 years ago

For example instance for Either is quite unelegant IMHO, compared to fair interleaving in http://hackage.haskell.org/package/countable-0.2/docs/src/Data-Countable.html Sunni how hard would be to count rationals for example

dmwit commented 8 years ago

You're right, we should have root :: [a], not root :: Maybe a. Then the Either instance should not be so bad (untested):

instance (Universe a, Universe b) => Universe (Either a b) where
    root = [Left r | r <- root] ++ [Right r | r <- root]
    parent (Left a) = Left <$> parent l
    parent (Right a) = Right <$> parent a
    children (Left a) = (Left <$>) <$> children a
    children (Right a) = (Right <$>) <$> children a
phadej commented 8 years ago

how about using Forest a, or actually its church encoding.

The root :: [a] and children :: a -> [a] kind of encode the Tree, but data is somehow more concrete. OTOH. My gut feeling is that one can make similar deforestration trick as with build / foldr but for Tree/Forest, and by using church encoding (i.e. something to pass to buildTree) we can prevent sharing and provide actual Tree.

IMHO then we can have possibly-infinite-height finite-branching tree, so we can encode interleave as forest merge.

I could try to implement / benchmark this idea, if it sounds feasible?

dmwit commented 8 years ago

I'm fully on board.

phadej commented 6 years ago

I'm coming back to this. And I have to rethink. Tree is good for +++, but it doesn't really work for (+*+) = product.

treeowl commented 5 years ago

Something must be done to let us avoid sharing, but I haven't really looked deeply into the options yet. In some cases, it may prove a bit tricky to convince GHC to avoid sharing without obfuscating the code. For example, the Rational instance really wants a snake to eat its tail, but we want to make a new snake every time someone demands the universe. sigh.