Closed kozross closed 2 years ago
This is a duplicate of #492 .
The Semigroup law states:
Instances should satisfy the associativity law:
x <> (y <> z) = (x <> y) <> z
WSerialT/AsyncT/WAsyncT/ParallelT are not defined as ordered streams, WAheadT is not ordered wrt effects. If we have to compare them for equivalence we should disregard the order. The semigroup law does not state clearly what is =
in this law. If we define =
using multiset equivalence, then they clearly satisfy the law.
It may be fair to say that the Semigroup law should hold wrt Eq
instance. But these streams do not have an Eq
instance in general. The type t Identity
has an Eq
instance though for which I agree that it is incorrect wrt to the Semigroup law.
Therefore, I propose to:
1) remove the Eq
instance for t Identity
except for SerialT
. We can possibly fix those to use multiset equivalence but that will require accumulating the streams, it may require memory proportional to the stream size and cannot work on infinite streams. I think the best thing is to remove Eq
completely and use SerialT
Eq
if you want ordered equivalence.
2) In the documentation make it sufficiently bold and clear that the associativity is true wrt unordered equivalence.
See also #144 . It is possible to use a commutative semigroup for unordered comparison but I am not sure if it is worth it to introduce and use a new type class for this with no significant advantage.
I would also like to know any instances of practical issues arising from it. As long as we have the right idea of equivalence absolutely no issues should arise.
In fact only SerialT Identity
and WSerialT Identity
have an Eq
instance. We can remove the one from WSerialT Identity
. We can also write a note in docs explaining why they do not have an Eq instance so that we consider that before thinking of introducing an Eq instance in future.
Equality is strongly associated with substitutivity. Given that we can turn streams into lists, this makes the ordering distinction extremely observable in practice - this is, in fact, one of the reasons why I was concerned. While I can't provide specific examples of where this could be an issue, we have no control over what things people write in the future which rely on something being a Semigroup
instance that lean on associativity (which is, in fact, its only useful property).
A more principled possible solution would be to create a new method in IsStream
of the form (><) :: t m a -> t m a -> t m a
, and then have that do stream concatenation as appropriate to the type. This would avoid the problem with Semigroup
(no expectations that this is associative), and can also be made infix 6
, so that bracketing is mandatory to indicate how it associates. You could then leave Semigroup
instances only on cases where we can follow the laws with equality-as-substitutivity.
I agree that it may be non-intuitive, but I am not convinced that it is incorrect. The equivalence of one type (stream) does not mandate the equivalence of another type (list) it is converted to. We can add that streams are associative under multiset equivalence, ordering is not guaranteed.
There is a trade-off in introducing a new operation. The complexity of the API increases. Unless there is evidence that it is problematic I would not be in favor of that.
The equivalence of one type (stream) does not mandate the equivalence of another type (list) it is converted to.
Well, to explicate what @kozross was saying, there's the small matter of the substitution property: with f = Streamly.toList
for example, you don't have x == y => f x == f y
. This contradicts any reasonable notion of equality.
Unless there is evidence that it is problematic I would not be in favor of that.
That amounts to 'it's wrong but as far as we know it hasn't broken anything yet' which IMO is a dangerous approach.
Removing the instances, and introducing a '<~>' operator is the kind of change I imagine people being happy with:
I love Streamly, but the status quo here is quite concerning to me.
you don't have x == y => f x == f y. This contradicts any reasonable notion of equality.
This is true only if f
is a pure function. Streamly.toList
is not a pure function.
That amounts to 'it's wrong but as far as we know it hasn't broken anything yet' which IMO is a dangerous approach.
Please don't get me wrong, we would be more than happy to change if something is broken and we are grateful to those who bring up such issues. I did not agree that it is wrong. I agreed that it may be non-intuitive and if there are instances where its really hurting we can change it.
Also see #535 which proposes to remove <>
altogether but for a different reason.
This is true only if
f
is a pure function.Streamly.toList
is not a pure function.
But not all monads allow for non-determinism - would you not consider Streamly.toList @Identity
a pure function?
This is an interesting question.
x
and y
then there is no question of equality holding for f x
and f y
.Note that, in my earlier comments I wrote:
remove the
Eq
instance fort Identity
except forSerialT
.
toList @Identity
a pure function.The original premise that I stated earlier was that WSerialT
and other streams are unordered types. So we need to compare them disregarding the ordering. If two streams consist of same elements but in different order they can be considered equivalent.
Now that we have the equality defined, we can come back to the original question. We have to preserve equality on toList
if we consider toList @Identity
a pure function.
The problem boils down to performing a transformation from an unordered domain to an ordered domain. There are many possible ways to represent the unordered set into an ordered sequence. But to preserve equality in the ordered domain we need a canonical ordering. One possible way is to define toList
such that it always sorts the stream in ascending order (or descending order).
toList @Identity
impure than there is no issue. I am not sure if there are well defined arguments to consider it pure or impure.BTW, I have always considered Identity
as pure.
x :: WSerialT Identity Char
x = S.yield 'x'
y = S.yield 'y'
z = S.yield 'z'
a1, a2 :: WSerialT Identity Char
a1 = x <> (y <> z)
a2 = (x <> y) <> z
f :: WSerialT Identity a -> [a]
f = runIdentity . S.toList . wSerially
Here, f a1 /= f a2
. These are pure values (String
s), and this doesn't even depend on an Eq
instance for any stream.
So, for example, if a user refactored a1
in to the form of a2
(they should be allowed to do this, by the very definition of a Semigroup
), a pure value in their program would change.
Here we are implicitly using the notion of equality that I explained in point (2) earlier. We need to have toList
to translate from unordered set to a canonical ordering. I agree that just removing Eq instance may not be enough because equality can be used implicitly as in this example.
Or we can disallow WSerialT
to be instantiated for Identity
because we consider Identity as pure but we are using it for a re-ordering effect. Other unordered stream types (concurrent ones) do not allow Identity
automatically because of the MonadIO
constraint.
We can also remove <>
as suggested originally in this issue but that would be inconsistent with other unordered streams. Anyway we have #535 as well, so let's keep this issue open until that is resolved.
Or we can disallow WSerialT to be instantiated for Identity because we consider Identity as pure but we are using it for a re-ordering effect. Other unordered stream types (concurrent ones) do not allow Identity automatically because of the MonadIO constraint.
An artificial MonadIO
constraint on WSerialT
's Semigroup
instance seems a possible solution. I'm unsure how much code that would break, but one could argue that such code is incorrect anyway, due to the effectful nature of WSerialT
.
There is no WSerialT
in 0.9.0
(deprecated).
This issue is now obselete.
Due to the enforced definition of
a <> b <> c
always beinga <> (b <> c)
, it means that<>
on WAsyncT isn't associative, as required by theSemigroup
laws. This is definitely not a good idea, as this can cause surprising behaviour from anything which relies on theSemigroup
laws. Additionally, even though this is documented, it's not nearly clear enough in my opinion: if a non-lawful instance is deemed acceptable, this should be yelled from the rooftops.