composewell / streamly

High performance, concurrent functional programming abstractions
https://streamly.composewell.com
Other
866 stars 66 forks source link

WSerialT's Semigroup instance is not lawful #513

Closed kozross closed 2 years ago

kozross commented 4 years ago

Due to the enforced definition of a <> b <> c always being a <> (b <> c), it means that <> on WAsyncT isn't associative, as required by the Semigroup laws. This is definitely not a good idea, as this can cause surprising behaviour from anything which relies on the Semigroup laws. Additionally, even though this is documented, it's not nearly clear enough in my opinion: if a non-lawful instance is deemed acceptable, this should be yelled from the rooftops.

harendra-kumar commented 4 years ago

This is a duplicate of #492 .

The Semigroup law states:

Instances should satisfy the associativity law:

    x <> (y <> z) = (x <> y) <> z

WSerialT/AsyncT/WAsyncT/ParallelT are not defined as ordered streams, WAheadT is not ordered wrt effects. If we have to compare them for equivalence we should disregard the order. The semigroup law does not state clearly what is = in this law. If we define = using multiset equivalence, then they clearly satisfy the law.

It may be fair to say that the Semigroup law should hold wrt Eq instance. But these streams do not have an Eq instance in general. The type t Identity has an Eq instance though for which I agree that it is incorrect wrt to the Semigroup law.

Therefore, I propose to:

1) remove the Eq instance for t Identity except for SerialT. We can possibly fix those to use multiset equivalence but that will require accumulating the streams, it may require memory proportional to the stream size and cannot work on infinite streams. I think the best thing is to remove Eq completely and use SerialT Eq if you want ordered equivalence.

2) In the documentation make it sufficiently bold and clear that the associativity is true wrt unordered equivalence.

See also #144 . It is possible to use a commutative semigroup for unordered comparison but I am not sure if it is worth it to introduce and use a new type class for this with no significant advantage.

I would also like to know any instances of practical issues arising from it. As long as we have the right idea of equivalence absolutely no issues should arise.

harendra-kumar commented 4 years ago

In fact only SerialT Identity and WSerialT Identity have an Eq instance. We can remove the one from WSerialT Identity. We can also write a note in docs explaining why they do not have an Eq instance so that we consider that before thinking of introducing an Eq instance in future.

kozross commented 4 years ago

Equality is strongly associated with substitutivity. Given that we can turn streams into lists, this makes the ordering distinction extremely observable in practice - this is, in fact, one of the reasons why I was concerned. While I can't provide specific examples of where this could be an issue, we have no control over what things people write in the future which rely on something being a Semigroup instance that lean on associativity (which is, in fact, its only useful property).

A more principled possible solution would be to create a new method in IsStream of the form (><) :: t m a -> t m a -> t m a, and then have that do stream concatenation as appropriate to the type. This would avoid the problem with Semigroup (no expectations that this is associative), and can also be made infix 6, so that bracketing is mandatory to indicate how it associates. You could then leave Semigroup instances only on cases where we can follow the laws with equality-as-substitutivity.

harendra-kumar commented 4 years ago

I agree that it may be non-intuitive, but I am not convinced that it is incorrect. The equivalence of one type (stream) does not mandate the equivalence of another type (list) it is converted to. We can add that streams are associative under multiset equivalence, ordering is not guaranteed.

There is a trade-off in introducing a new operation. The complexity of the API increases. Unless there is evidence that it is problematic I would not be in favor of that.

georgefst commented 4 years ago

The equivalence of one type (stream) does not mandate the equivalence of another type (list) it is converted to.

Well, to explicate what @kozross was saying, there's the small matter of the substitution property: with f = Streamly.toList for example, you don't have x == y => f x == f y. This contradicts any reasonable notion of equality.

Unless there is evidence that it is problematic I would not be in favor of that.

That amounts to 'it's wrong but as far as we know it hasn't broken anything yet' which IMO is a dangerous approach.

Removing the instances, and introducing a '<~>' operator is the kind of change I imagine people being happy with:

I love Streamly, but the status quo here is quite concerning to me.

harendra-kumar commented 4 years ago

you don't have x == y => f x == f y. This contradicts any reasonable notion of equality.

This is true only if f is a pure function. Streamly.toList is not a pure function.

That amounts to 'it's wrong but as far as we know it hasn't broken anything yet' which IMO is a dangerous approach.

Please don't get me wrong, we would be more than happy to change if something is broken and we are grateful to those who bring up such issues. I did not agree that it is wrong. I agreed that it may be non-intuitive and if there are instances where its really hurting we can change it.

Also see #535 which proposes to remove <> altogether but for a different reason.

georgefst commented 4 years ago

This is true only if f is a pure function. Streamly.toList is not a pure function.

But not all monads allow for non-determinism - would you not consider Streamly.toList @Identity a pure function?

harendra-kumar commented 4 years ago

This is an interesting question.

  1. If there is no equality defined for x and y then there is no question of equality holding for f x and f y.

Note that, in my earlier comments I wrote:

remove the Eq instance for t Identity except for SerialT.

  1. Let's say we have an equality defined for unordered streams and we consider toList @Identity a pure function.

The original premise that I stated earlier was that WSerialT and other streams are unordered types. So we need to compare them disregarding the ordering. If two streams consist of same elements but in different order they can be considered equivalent.

Now that we have the equality defined, we can come back to the original question. We have to preserve equality on toList if we consider toList @Identity a pure function.

The problem boils down to performing a transformation from an unordered domain to an ordered domain. There are many possible ways to represent the unordered set into an ordered sequence. But to preserve equality in the ordered domain we need a canonical ordering. One possible way is to define toList such that it always sorts the stream in ascending order (or descending order).

  1. If we consider toList @Identity impure than there is no issue. I am not sure if there are well defined arguments to consider it pure or impure.
harendra-kumar commented 4 years ago

BTW, I have always considered Identity as pure.

georgefst commented 4 years ago
x :: WSerialT Identity Char
x = S.yield 'x'
y = S.yield 'y'
z = S.yield 'z'

a1, a2 :: WSerialT Identity Char
a1 = x <> (y <> z)
a2 = (x <> y) <> z

f :: WSerialT Identity a -> [a]
f = runIdentity . S.toList . wSerially

Here, f a1 /= f a2. These are pure values (Strings), and this doesn't even depend on an Eq instance for any stream.

So, for example, if a user refactored a1 in to the form of a2 (they should be allowed to do this, by the very definition of a Semigroup), a pure value in their program would change.

harendra-kumar commented 4 years ago

Here we are implicitly using the notion of equality that I explained in point (2) earlier. We need to have toList to translate from unordered set to a canonical ordering. I agree that just removing Eq instance may not be enough because equality can be used implicitly as in this example.

Or we can disallow WSerialT to be instantiated for Identity because we consider Identity as pure but we are using it for a re-ordering effect. Other unordered stream types (concurrent ones) do not allow Identity automatically because of the MonadIO constraint.

We can also remove <> as suggested originally in this issue but that would be inconsistent with other unordered streams. Anyway we have #535 as well, so let's keep this issue open until that is resolved.

georgefst commented 4 years ago

Or we can disallow WSerialT to be instantiated for Identity because we consider Identity as pure but we are using it for a re-ordering effect. Other unordered stream types (concurrent ones) do not allow Identity automatically because of the MonadIO constraint.

An artificial MonadIO constraint on WSerialT's Semigroup instance seems a possible solution. I'm unsure how much code that would break, but one could argue that such code is incorrect anyway, due to the effectful nature of WSerialT.

adithyaov commented 2 years ago

There is no WSerialT in 0.9.0 (deprecated). This issue is now obselete.