Open harendra-kumar opened 2 years ago
Operations affected by this:
Benchmark Prelude.Serial(0)(μs) Data.Stream(1) - Prelude.Serial(0)(%)
------------------------------------------------------- --------------------- -------------------------------------
o-1-space.multi-stream.isSubsequenceOf 111.84 +1284.86
o-1-space.multi-stream.stripPrefix 111.91 +1277.46
o-1-space.multi-stream.isPrefixOf 111.83 +1270.55
o-1-space.exceptions/serial.retryUnknown 1195.69 +193.27
o-1-space.exceptions/serial.retryNoneSimple 1717.68 +126.67
o-1-space.exceptions/serial.retryNone 1605.25 +40.85
o-1-space.mapping.foldrS 3085.84 +64.95
o-1-space.mapping.foldrSMap 3207.70 +56.39
o-1-space.mapping.foldrT 3851.66 +48.55
o-1-space.mapping.foldrTMap 3971.81 +47.23
o-1-space.elimination.build.Identity.foldrMToListLength 810.75 +29.70
o-1-space.elimination.uncons 1057.74 +31.26
The problem is not really an issue with unfolds but a general fusion issue. Unfolds just happen to use a Skip constructor at the beginning of the stream which triggers the issue. If we use a "drop" operation in the starting of the stream we can simulate the same with streams as well. For example, the following code using a stream instead of unfold also runs into a similar fusion issue:
main = do
r <- isSubsequenceOf (Stream.drop 1 stream)
print r
Since these are fold operations with recursive functions, using SPEC and strict arguments seems to do the trick:
isSubsequenceOf :: (Eq a, Monad m) => Stream m a -> Stream m a -> m Bool
isSubsequenceOf (Stream stepa ta) (Stream stepb tb) = go SPEC Nothing' ta tb
where
go !_ Nothing' sa sb = do
r <- stepa defState sa
case r of
Yield x sa' -> go SPEC (Just' x) sa' sb
Skip sa' -> go SPEC Nothing' sa' sb
Stop -> return True
go !_ (Just' x) sa sb = do
r <- stepb defState sb
case r of
Yield y sb' ->
if x == y
then go SPEC Nothing' sa sb'
else go SPEC (Just' x) sa sb'
Skip sb' -> go SPEC (Just' x) sa sb'
Stop -> return False
Generates the following core:
main_$sgo
= \ sc_s3b3 sc1_s3b2 sc2_s3b1 eta_B0 ->
case ># sc_s3b3 100000# of {
__DEFAULT ->
case ==# sc2_s3b1 sc_s3b3 of {
__DEFAULT -> main_$sgo (+# sc_s3b3 1#) sc1_s3b2 sc2_s3b1 eta_B0;
1# ->
case ># sc1_s3b2 100000# of {
__DEFAULT ->
main_$sgo (+# sc_s3b3 1#) (+# sc1_s3b2 1#) sc1_s3b2 eta_B0;
1# -> hPutStr2 stdout $fShowBool4 True eta_B0
}
};
1# -> hPutStr2 stdout $fShowBool5 True eta_B0
}
If we generate the streams using unfolds and use those streams in multi-stream operations e.g. isSubsequenceOf then the operation does not fuse, however, if we use the direct implementation of the stream generators without using unfolds then it fuses.
The difference in unfolds is that we have an additional state to inject the seed before we start generating the stream.
A stream generated by unfoldrM simplifies as follows:
isSubsequenceOf looks like:
On the other hand if we use the stream
unfoldrM
operation the core looks like this:We do have direct srteam implementations for generation operations but we would prefer to generate everything using unfolds. If GHC can fuse it properly that would be possible. Need to further investigate what's going on here.