dotnet / reactive

The Reactive Extensions for .NET
http://reactivex.io
MIT License
6.73k stars 751 forks source link

Do operator can perform unexpected type conversion on observable sequence #2147

Open glopesdev opened 4 months ago

glopesdev commented 4 months ago

Bug report

Which library version?

6.0.1

What are the platform(s), environment(s) and related component version(s)?

Tested in both .NET Framework and .NET 8.0

What is the use case or problem?

var rx = Observable.Return("hello");
IObserver<object> subject = new Subject<object>();

rx = rx.Do(subject);

What is the expected outcome?

The result of the Do operator should not change the type of the resulting observable sequence, i.e. it should behave as a pure side-effect, such that adding or removing Do should have no effect on the type of the sequence.

Specifically in this case, Do should return IObservable<string>.

What is the actual outcome?

The output of Do is of type IObservable<object>.

Proposed signature overload

The following is provided merely as an example of a function overload which can be used to workaround the current problem, and to inspire potential future revisions of the current implementation:

static IObservable<TSource> Do<TSource, TObserver>(this IObservable<TSource> source, IObserver<TObserver> subject)
    where TSource : class, TObserver
{
    return source.Do(subject.OnNext, subject.OnError, subject.OnCompleted);
}
idg10 commented 4 months ago

I think if we were to do this it would probably look more like this

public static IObservable<TSource> Do<TSource, TObserver>(
    this IObservable<TSource> source,
    IObserver<TObserver> observer)
    where TSource : class, TObserver
{
    return source.Do<TSource>(observer);
}

This also points to a simple change to your example that would mean you wouldn't have run into the problem:

var rx = Observable.Return("hello");
IObserver<object> subject = new Subject<object>();

rx = rx.Do<string>(subject);

This would also work:

var rx = Observable.Return("hello");
IObserver<string> subject = new Subject<object>();

rx = rx.Do(subject);

So I think this all shows that the following two statements are true:

  1. a fairly simple addition to System.Reactive would make your example work
  2. fairly simple changes to your example would make it work

Because of 2, this proposal doesn't look to me like it would enable anyone to do anything they can't already do today. So this would boil down to a question of ease of use: does this make Rx an easier library to use?

Clearly it makes it easier to use Do in this context: as Rx stands today you either need to use an explicit type argument, or you need to use the somewhat subtle technique of getting the contravariance of T in IObserver<T> to kick in earlier by changing the declared type of a variable (i.e., my "would also work" example above). Both of my suggested workarounds are the kind of code that might cause some headscratching if you come back to it a year after writing it. I think it might even be true to say that if Do is going to offer an overload that accepts an IObserver<T>, it probably should have been designed the way you're suggesting in the first place. (I've been trying to work out whether co/contravariance was available in the version of C# we had on Windows Phone when the very first public version of Rx.NET came out. It's possible that this predated C# support for variance, which would explain why it was designed this way: if contravariance is not available, you wouldn't have been able to write your example. But I've not been able to find out if that's the original reason.)

However, I'd ask you to consider this: I've come round to the position that using Do is in most cases a bad idea. Just like unsafe code in C#, it has its uses, but it's really easy to go wrong with it. As it happens I'm part way through producing a video explaining why that is, but the short version is that Do makes you operate at a much lower level of abstraction than the majority of Rx operators, and in particular it tends to require you to understand in much more detail exactly what happens when. (This has been highlighted by some recent issues reported in this repo where Do behaved exactly as designed but this produced behaviour people did not expect. That, unfortunately, is the nature of Do.)

So I'm not sure that adding features to make Do easier to use is a good idea. I think it is likely to be a net reduction in ease of use because it actively encourages people to use a feature that is harder to use than it looks. It feels similar to making it easier to open a bottle containing toxic chemicals: do we really want to make that easier?

So my current position is that an addition such as this would be sending out completely the wrong message: it would be an implicit encouragement to use an Rx feature (Do) that I think tends to increase the number of bugs in people's applications.

I would not remove Do, any more than I would argue to remove unsafe code from C#. But I consider these things to be last resorts. It would be possible to remove obstructions to use of unsafe code (e.g., currently you have to enable it in two different ways to make it work) but I would not be in favour of such a change. This proposed addition to Do is perhaps less extreme—I think it makes it just as easy to make mistakes as unsafe code does, but the consequences of the mistakes tend to be slightly less severe—but it still seems like a similar thing: we'd be making it easier to use a feature that makes your life harder.

The strongest argument in favour that I can see here is that if this overload of Do is to exist at all, it looks like it should probably have been designed in the way you propose. The current implementation might have been an oversight, or it might reflect language or runtime limitations that existed on platforms Rx supported at launch but which have long since fallen out of support. In a clean sheet design today, if we were to offer this feature at all, it would work as you propose.

The biggest downside is that we have a long-term goal of discouraging the use of Do. A possible additional downside is that adding this new overload might create overload ambiguities for some Rx users in code that compiles fine today. (I think it doesn't create that problem, but overload resolution is so complex in C# that it's quite hard to be certain.)

So, now that I've explained my reasons for reluctance, does that change your view at all? Or do you still believe we should add this despite my misgivings?

glopesdev commented 4 months ago

@idg10 this line blew me away rx = rx.Do<string>(subject)

It makes of course complete sense, somehow contravariance always gets me!

I agree with your reasoning, and to be honest I was also worried how someone else somewhere might be bitten by a change like this, so I am happy to close this issue, especially now we have a really simple and documented workaround.

However, I'd ask you to consider this: I've come round to the position that using Do is in most cases a bad idea. Just like unsafe code in C#, it has its uses, but it's really easy to go wrong with it.

I did get curious about your stance on the use of Do and how it should not be used. Specifically, I was curious about whether you are against the use of specifically the Do operator, or more generally against any operators which inject pure side-effects into observable sequences?

I've found the latter to be incredibly powerful for composition, as they allow enabling / disabling logging, monitoring and other side-effects in a chain without having to rewrite any code. I am with you that I very rarely use Do directly for this and rather end up designing specific operators to ensure guarantees on initialization, to control the impact of errors on the side-effects, throttling, etc, but I still find conceptually the power of "pure side-effect" operators to be tremendous.