WICG / observable

Observable API proposal
https://wicg.github.io/observable/
Other
596 stars 16 forks source link

Reconsider unpredictable hot/cold and async/sync behavior #170

Open esprehn opened 3 months ago

esprehn commented 3 months ago

The rxjs inspired API creates a situation where Observables sometimes have side effects upon subscription (hot vs cold) and sometimes emit synchronously and sometimes async. If you have a function that takes an Observable as an argument you have no predictability if it will have a side effect upon subscription (is it safe to call it twice?), and no predictability if the emit will happen immediately when subscribing or later in a microtask.

This is a manifestation of Releasing Zalgo: https://blog.izs.me/2013/08/designing-apis-for-asynchrony/ https://blog.ometer.com/2011/07/24/callbacks-synchronous-and-asynchronous/

Promises are the opposite of this:

This means if someone passes a Promise to you, all operations on it are predictable and safe.

With Observables the situation is different:

In large systems built on Observables I see this manifest often as:

Other Web APIs are also consistent today (ex. MutationObserver, ResizeObserver, Promises), and never emit during subscription and have a split between observing the operation and executing it. The one place I'm aware where this was violated is MessagePort#onmessage starting the port, but it's generally been considered a mistake.

Context: I lead possibly the largest codebase built on rxjs in the world.

PowerKiKi commented 3 months ago

I share similar concerns, albeit on much smaller projects. Async vs sync can be annoying, but I feel hot vs cold is even more important.

I don't know what kind of API could solve those annoyances, but if there's a reasonable solution, it would be awesome to standardize on that instead.

I'd like to hear the thoughts of @benlesh about this, as I am sure that must have come up in the years of maintaining rxjs.

domfarolino commented 3 months ago

Thanks so much for filing this. Thoughts below.

Async vs sync callbacks

I understand and am sympathetic to the async vs sync concern here, but I'm not sure how to handle it without making things more complicated. From reading the supplied articles, it seems to me that the concern is not so much that subscriber.next() synchronously invokes the SubscriptionObserver#next() handler in the general case (this seems required if Observables are going to integrate with Events in any useful way). But the problem seems to be that it can do so before subscribe() actually returns. This is the synchronous firehose problem. Do I have the concern right?

I really think in the general case — long after subscribe() returns, for example — subscriber.next() needs to synchronously invoke the next handler, for Observables to be at all useful for events. So the only remedy I can imagine here is somehow deferring the effects of subscriber.next() iff subscribe() itself is on the stack. We could queue the invocation of the next() handler behind a microtask, as to avoid the sync firehose problem altogether. It's a little funny though, because then we end up in a situation where:

  1. With respect to subscribe(), Observables are "async": That is, no observer callback (like a next() handler) can run while subscribe() is on the stack. The API is effectively "async" in that it never behaves like list.forEach(callback); but...
  2. With respect to subscriber.next(), Observables can be async & sync: That is, most of the time when you call subscriber.next(), the relevant observer callback will run synchronously right then and there. Except if subscribe() is on the stack, then the effects of next() are deferred a tick

This seems to just move the "this API is both sync and async" concern from subscribe() ➡️ subscriber.next() (and kin). That's not great. I think this is an artifact of the fact that Observables are lazy and Promises are not. That is, for Observables, subscribe() actually does stuff, where for Promises, then() just listens to stuff that's already underway. I see two ways of fixing this:

  1. Never invoke observer callbacks synchronously. I think this is a dealbreaker for event integration.
  2. Make Observables eager; you run the "subscriber callback" upon Observable creation instead, but only once and for all. You don't get access to any values until you finally subscribe().
    • What do we do with values the Observable has encountered in between creation and subscription? We can't queue them for later replay (Promises do this kinda replay, but for a single value). If we did, we'd have to replay them synchronously to the observer during subscribe(), but that brings us back to the concern that observer/subscribe callbacks may or may not be fired synchronously during subscribe(). If we replayed the values asynchronously to the observer, then that brings us to the other problem above where sometimes your next() handler is fired synchronously with respect to subscriber.next(), and sometimes it isn't. But that's the same problem that led us to consider making Observables eager, so I don't think it actually solves anything.

I'd love some more thoughts on this. The simplest thing here is (1) above, but I think it's a dealbreaker for event integration. The other options I've thought of are just as tricky/complicated as the current state of affairs, but have the added downside of breaking the Observable contract that any users today expect, for unclear benefit.

Hot vs cold (side effects)

I don't think I understand this concern very much, but I could just be missing something big. I'm considering the following comments:

have no predictability if it will have a side effect upon subscription

With Observables the situation is different:

  • Calling subscribe might be expensive (API request) or cheap (replay)

Our proposal here has no "shared" state across subscriptions, so if any call to subscribe() has side effects, then all calls to subscribe() have side effects. So I don't think there's anything in our proposal that we can address regarding hot vs cold. All Observables feel hot, to me.

Maybe the concern is the user code inside the subscriber callback has some "shared" state, where it doesn't perform the side effects more than once (i.e., for the 2nd+ subscription to the same Observable). Is that the concern? If so, I feel like Promise-returning APIs have the same concern. You could imagine some API that (a) does something super expensive, and (b) vends a Promise representing the value. If that API is called many times, you might get a Promise that just replays the cached shared value, as to not re-kick-off the expensive operation. That "tricky"/optimized user code is the same as the user code that would be in the subscriber callback, that's deciding to share/replay some state across many subscriptions to the same Observable.

benlesh commented 3 months ago

It's funny, I actually spoke directly to @isaacs (the author of the Zalgo blog article) about this a long time ago (probably 2015) when I ran into him at a birthday party for a mutual friend in the Bay Area. "Zalgo" can be problematic for sure, but it's really not appropriate to worry about that for a reactive programming type like this. Observables are fairly low level.

More importantly though, this needs to work for EventTarget, and if it forces asynchrony (like a promise does) it doesn't work because of things like preventDefault() or stopPropagation():

form.when('submit').subscribe(e => {
  // If this is always async, it's too late.
  e.preventDefault();
});

element.when('click').subscribe(e => {
  // If this is always async, it's too late.
  e.preventDefault();
})

Further, EventTarget is capable of wiring up an event handler, dispatching it, then removing the event handler synchronously, even now. And we need to be able to model that:

const handler = () => console.log('event handled!');

console.log('adding listener');
eventTarget.addEventListener('test', handler);
console.log('dispatching');
eventTarget.dispatchEvent(new CustomEvent('test'));
console.log('removing listener');
eventTarget.removeEventListener('test', handler);

// logs
"adding listener"
"dispatching"
"event handled!"
"removing listener"

Where this should do the same:

const ac = new AbortController();
console.log('adding listener');
eventTarget.when('test').subscribe(
  () => console.log('event handled!'), 
  { signal: ac.signal },
);
console.log('dispatching');
eventTarget.dispatchEvent(new CustomEvent('test'));
console.log('removing listener');
ac.abort();
benlesh commented 3 months ago

On the front of "hot" and "cold": Really what you're talking about is forced multicast. That comes with complications and overhead.

EventTarget#when will always be "hot" and force multicast. That's because it doesn't create the source of the events, the EventTarget, because it existed prior to the creation of the observable.

It's plausible to make everything "hot" or "multicast" by default... however here are the complications:

  1. Every observable would need to do reference counting for its subscribers.
  2. What is the behavior at first subscription? Does the observable start then? Was it already started?
  3. On subscription 2+, if those subscribers are late, do we give them all of the previous values? The most recent value? or just make them wait for values?
  4. If subscription must be allowed to emit asynchronously (see my previous comment), then if multiple subscribers subscribe to the same observable, even synchronously in a loop, we'll have to be able to send all previous values to the new subscribers, because the second subscribers would have missed them.
  5. When the observable is complete or errored does it need to be recreated? Or does it "reset" to its original state?
  6. When all subscribers abort or otherwise end their subscriptions, do we keep the source running? Do we abort it? Does it retain its previous values? Does it revert to its original state?

On the other hand, the "cold" observable, which is the most basic, low-level design for observable, is almost a fancy function with some guarantees. I can be used to compose any of the behaviors above quite easily.

Another thought: Most of the desire around forced multicast for observables generally comes from people wanting to use observables for state management. While observables can be used for state management, really, they're best at composing events. Something like Signals are better used for state management.

esprehn commented 3 months ago

@domfarolino

I don't understand what you mean by With respect to subscribe(), Observables are "async":, the subscribe callback can emit synchronously (the firehose situation). That's what I'm pushing back on, the rest of the platform doesn't work that way and it has the Zalgo issue.

If someone passes a Promise into a function there's nothing I can do to that Promise to end up back in sync land. Bluebird did actually have that behavior back in the day and it was quite painful and thankfully native Promises didn't carry it over.

An example fix for this would be for the the subscribe callback to emit on microtask, which means even if the subscription is created synchronously the callback is always async which avoids Zalgo.

@benlesh Events emitted by the browser will always have a microtask checkpoint before going back into native code, which means even if there's a microtask resolve you could preventDefault(), so your examples do work.

const a = document.body.appendChild(document.createElement('a'))
a.textContent = "Click me!";
a.href = "https://www.google.com/";
a.onclick = (e) => {
  Promise.resolve().then(() => e.preventDefault());
};

The situation that doesn't work is manually dispatched events (calling dispatchEvent/click from inside JS). That's a platform issue where there's no way to manually trigger a microtask checkpoint and APIs behave differently when done from the JS vs native code. The fix for that is to either expose manual checkpoints or add a EventTarget#dispatchEventAsync(): Promise<boolean> API which queues a native task so there's a checkpoint before reading the result of the dispatch.

domfarolino commented 3 months ago

I don't understand what you mean by With respect to subscribe(), Observables are "async":

Please see the sentence immediately preceding the one you quoted:

We could queue the invocation of the next() handler behind a microtask, as to avoid the sync firehose problem altogether. It's a little funny though, because then we end up in a situation where [...]

In other words, I'm describing (with the sentence "With respect to subscribe(), Observables are "async"") what would be the outcome if we "queued the invocation of the next() handler behind a microtask, as to avoid the sync firehose problem altogether".

benlesh commented 3 months ago

Events emitted by the browser will always have a microtask checkpoint before going back into native code

TIL, @esprehn, thanks! That's good news. However there are still issues with scheduling for every emitted value.

  1. Unlike promises, which will only have one value to worry about, an observable can have any number of values. That means we'd have to hold onto every value that was scheduled to be emitted on that microtask until the microtask fired. This is effectively an unbounded buffer. Not a big deal, but not great as the observables are used across an app.
  2. It actually exacerbates the "sync firehose problem" by obscuring it from the developer: In the case of Observable.from(iterator), where the iterator is perhaps infinite, the user will be unable to unsubscribe from the iterator based off of what's been emitted, as the iterator will have to complete its synchronous iteration before the microtask to emit the values fires.
  3. Fundamentally, observable is the "dual" of iterable, and a primitive. The synchronous emission is completely by design and required. Problem 2 above is a symptom of us trying to deviate from the principles there. In more than a decade of this type, the lack of async emission has hardly ever been the issue. Usually, issues with observable arise from the fact that asynchrony is hard to reason about, and people tended to overuse the 100+ RxJS operators.

An example for problem number 2 above:

function* allIntegers() {
  let i = 0;
  while (true) yield i++;
}

const ac = new AbortController();

Observable.from(allIntegers()).subscribe(n => {
  console.log(n);
  if (n === 10) {
    ac.abort();
  }
}, { signal: ac.signal });

Which seems like nonsense, sure, it's just taken to the extreme... but someone could do something with it that runs through a potentially large set of data, but you only really want to get the first few that match something then run it.

window.when('message')
  .switchMap(e => {
    return Observable.from(e.data.rows)
      .filter(row => row.shouldStream)
      .take(3)
      .flatMap(row => streamRowData(row))
  })
  .subscribe(updateView, { signal })

Now... you could ABSOLUTELY just do the filtering and taking in the iteration part of this code. BUT observable is the dual of iterable... it should "just work" in roughly the same way no matter which type you choose. If observable schedules, however, it cannot.

The only other argument I've seen (and it's even an argument I've made myself) is that "observable should schedule because we don't get the cancellation mechanism back until after the subscribe call returns", which was (and is) problematic because the "synchronous firehose" can run before the consumer has the ability to cancel. RxJS has to do some dancing around that part of the design. But since with this design we're now only accepting the cancellation mechanism via AbortSignal as an argument to subscribe, that's no longer an issue.

And again... I'll go back to say that for more than a decade of very broad use, observable has not needed scheduled emissions. None of the implementations we cite in the README as examples of prior art add any additional scheduling.

I would expect to find new issues if we introduced this behavior, as it's a breaking change from what has been used for so long. Different unknowns and edge cases.

esprehn commented 3 months ago

In more than a decade of this type, the lack of async emission has hardly ever been the issue.

I disagree with this statement. It's not been my experience, which is why I filed this bug. I oversee a massive rxjs codebase maintained by thousands of engineers.

Taking a step back, I don't think "the sync firehose" makes sense on the web platform. Can you give a real example?

The unbounded buffer is blocking the thread which is not something you should be doing on the web platform since it prevents promises from resolving and the thread from being responsive to user input. If you need an actual unbounded firehose that sounds like you should use an async iterator inside the callback which would create microtasks interleaving consumption and emission creating those opportunities to unsubscribe.

Taking your example:

window.when('message')
  .switchMap(e => {
    return Observable.from(AsyncIterable.from(e.data.rows))
      .filter(row => row.shouldStream)
      .take(3)
      .flatMap(row => streamRowData(row))
  })
  .subscribe(updateView, { signal })

does exactly what you want.

See also: https://github.com/tc39/proposal-async-iterator-helpers

benlesh commented 3 months ago

Observable.from(AsyncIterable.from(e.data.rows))

This is a magic recipe though.

Forced asynchrony adds unnecessary complexity and overhead to the type. It introduces a lot of unknowns and questions and untested behaviors that will challenge the entire API, and I have yet to see why that would be beneficial beyond worries about "Zalgo" which aren't really appropriate for this type.

In a world with this brand new, never-been-tried forced async observable type, what happens here?

const source = new Observable((subscriber) => {
  let n = 0;
  while (subscriber.active) {
    subscriber.next(n++);
  }
});

source.take(10).subscribe(console.log);

In the untried new observable of forced async, it blocks the thread. With a synchronous observable, it does not.

And what happens here? Are the async task logs interleaved with the map calls and the subscribe log? Do we need to queue the value and schedule the processing of of every projected map value?

button.when('click')
   .map(() => {
     console.log('in map 1')

     queueMicrotask(() => {
       console.log('do something async');
     });

     return Date.now()
   })
   .map(() => {
     console.log('in map 2')

     queueMicrotask(() => {
       console.log('do something async');
     });

     return Date.now()
   })
   .subscribe(console.log)

There's just so many unknowns to this newly proposed type.

The scheduling allocations per turn alone would scale terribly, and I'm not even sure there's a mechanism for cancelling a micro task yet.

Even if we tried to be conservative about this change and said "Well, let's only schedule the subscription, not the emissions" so we prevent all of the overhead and other weird issues with cancellation, Then there's the question about subscriptions inside of methods. If you have source.map().filter().subscribe() The subscription to subscribe will schedule, then the subscription in filter() will schedule, then the subscription in map() will schedule, and finally, the subscription to source will schedule... So we've scheduled four times just to get the stream started.

Forcing asynchrony would be a huge mistake and will actually make things harder to reason about, not easier. And at the cost of additional memory footprint and processing time.

benlesh commented 3 months ago

I'm totally willing to die on this hill, honestly. Even the title of this thread unfairly characterizes the behavior of this very basic type as "unpredictable", when it's in fact completely predictable until you force it to schedule with other arbitrary microtasks.

benlesh commented 3 months ago

Thinking about this even more. If there was forced scheduling, testing observables becomes even more cumbersome.

it('should handle events via an observable', async () => {
  const target = new EventTarget();

  const results = [];

  target.when('test')
    .map(e => e.detail.value)
    .subscribe((value) => {
      results.push(value);
    });

  // wait for subscription to start.
  await Promise.resolve();

  target.dispatchEvent(new CustomEvent('test', { detail: 'hi' }));

  // wait for event to propagate
  await Promise.resolve();

  expect(results).toEqual(['hi']);
});

...which is even less ergonomic than addEventListener or really any other approach related to events, as far as testing goes.

Even if you use some harness to "click" a button, you'll have to wait a tick to know for sure the click happened. It's just more complicated.

benlesh commented 3 months ago

Now.. all of this said, if you really wanted to consume an observable with forced asynchrony, it implements Symbol.asyncIterator. So all you'd have to do is consume it in a for await loop. You could even have a simple lint rule that forced either that or await observable.forEach() to ensure that no one "released Zalgo", only in a way that doesn't force a complicated behavior on everyone.

benlesh commented 2 months ago

After the discussion the other day, I tossed together a "hot observable", for sake of helping me think through it, and it predictably raises questions:

  1. Should it be reference counted? Meaning does it "start" the subscription with the first subscriber, and terminate after the last subscriber aborts their subscription?
  2. If it's reference counted, should we require an AbortSignal be passed in? If we don't, leaks will be hard to track down in code (where multiple consumers, subscribe, but one of them never unsubscribes, keeping the resource running)
  3. If it's not reference counted, does it require two separate calls to start and stop the observable? If so, what does that look like? Even more so, how does that effect the EventTarget API?
  4. If it's not reference counted and it just starts immediately, prior to any new subscribers, how can we use that to model EventTarget?
domenic commented 2 months ago

I feel like promises give a precedent for all these questions?

  1. No. Subscriptions would be separate from the observable and there's no signal about the number of subscriptions.

  2. Stopping/starting the underlying resource would be separate from the observable, e.g. element.remove() to stop it from possibly creating events, or similar.

  3. It depends on the API. E.g. arguably for DOM elements, you "start" the corresponding observable by appending the element to the DOM, and "stop" it by removing the element from the DOM.

  4. I don't understand the problem. What I'm made to understand is that EventTarget-produced observables are hot already.

benlesh commented 2 months ago

Stopping/starting the underlying resource would be separate from the observable, e.g. element.remove() to stop it from possibly creating events, or similar.

This would mean that button.when('click') would, at time of call, add an event listener for click that was broadcasting to any new subscribers. There wouldn't be a way to remove that event listener without removing the button.

If the observable is cold, the developer can control when the underlying event listener is added or removed:

const clicks = button.when('click');

const ac = new AbortController();

// Add the event listener
clicks.subscribe(console.log, { signal: ac.signal });

// remove the event listener
ac.abort();

Otherwise, with "instantly hot" observables, we'd have to be able to pass an AbortSignal in at the moment it was created, meaning when, filter, et al, would need such an argument.

const ac = new AbortController();

button.when('click', { signal: ac.signal })
  .filter(e => e.clientX > 100, { signal: ac.signal })
  .subscribe(console.log, { signal: ac.signal });

ac.abort();

Because the result of filter is going to instantly subscribe to its source, internally adding a subscriber to the source, and we need to provide a way to remove it.

It depends on the API. E.g. arguably for DOM elements, you "start" the corresponding observable by appending the element to the DOM, and "stop" it by removing the element from the DOM.

When composing events with observables, it's often desirable to use methods like switchMap to start and stop listening to other events for specific reasons. Similarly takeUntil. If we're unable to provide a way to stop those listeners, it's likely to cause issues where developers create leaks:

const ac = new AbortController();

moveableElement.when('mousedown').switchMap(() => {
  return document.when('mousemove')
    .takeUntil(document.when('mouseup'))
    .map(e => {
      return { x: e.clientX, y: e.clientY }
    })
})
.subscribe(({ x, y }) => {
  moveableElement.style.left = `${x}px`;
  moveableElement.style.top = `${y}px`;
}, { signal: ac.signal })

In the above example, document.when('mousemove') and document.when('mouseup') would attach a new listener to the document every single time someone performed a mousedown on moveableElement. This would be very surprising to users of existing observables, and it's a non-obvious foot-gun.

It's fixable, of course, but only if you declare the other observables at the top level (which interestingly, still works with "cold" observables):

const ac = new AbortController();

const documentMouseMoves = document.when('mousemove');
const documentMouseUps = document.when('mouseup');

moveableElement.when('mousedown').switchMap(() => {
  return documentMouseMoves
    .takeUntil(documentMouseUps)
    .map(e => {
      return { x: e.clientX, y: e.clientY }
    })
})
.subscribe(({ x, y }) => {
  moveableElement.style.left = `${x}px`;
  moveableElement.style.top = `${y}px`;
}, { signal: ac.signal })

Which, of course, doesn't account for my previously mentioned issue with needing a way to pass an AbortSignal to the various methods that create new, hot observables. (So you'd see { signal: ac.signal }) sprinkled all over in that case.

noamr commented 2 months ago

Something to take into account with when is that addEventListener can have subtle and surprising side effects in itself, e.g. a touchstart or scroll event listener can affect scrolling behavior. I think this kind of effect makes it risky to introduce a hot-only behavior here. As without reading the docs you wouldn't be able to tell if when or subscribe calls addEventListener, it's perhaps better to err on the side of only creating them if the events they generate are going to be used in practice.

Also, I think we don't have to think of observables as promises, they're more like streams in a way. Attaching a reader/pipe etc to a stream might invoke the write method on the other side and create a chain reaction. I think that's a closer precedence.

benlesh commented 2 months ago

I've started a new discussion specifically about the idea of ref-counted observables here: https://github.com/WICG/observable/issues/178#issue-2559341348

noamr commented 1 month ago

I thought again about the scroll/touch issue (adding event listeners has performance implications).

What if we made it so that creating observers for these events is passive by default, and if you want the active behavior (preventing default behavior etc) you have to use regular event listeners?

This feels aligned with the overall purpose of observers where they're less tuned towards overriding platform behavior and more towards, well, observing it (passively as data).

If we had this behavior for event listeners, and as a guideline that in general the observer registration should not be expensive, perhaps registering a listener with each when is simpler than introducing ref-counting and implicit registration? (Sorry but I can't wrap my head around which one is called hot vs. cold). Passive-by-default is anyway a nice side effects performance-wise, which would give when added value.

This also leads me to think that this question is not at all about observables as a general construct, but about the observable shim with the platform, e.g. when. This might have a different behavior for each type of shim and I don't know how to generalize this to the observable constructor.

domfarolino commented 1 month ago

First off, I think setting https://wicg.github.io/observable/#ref-for-event-listener-passive to true by default is probably a good idea, assuming we'd want to do that with addEventListener() if we were designing it from scratch today. I'm not sure that's the case though — don't cancelable default-preventable events have a legitimate use case that's accepted? Or are these seen as legacy concepts that we've vowed to not add anymore, and that we'd do away with if we could do it all over again? If passive is a better default but we don't want to totally discourage people from active listening, we could make things passive by default but still let users override https://wicg.github.io/observable/#dom-observableeventlisteneroptions-passive to false, so those users can still use Observables.

Second, I think that's all separate from the issue of Observables having a ref-counted producer and being cold/lazy. We are not making Observable producers ref-counted for performance reasons a la passive listening nearly as much as we're doing it for predictability upon multiple subscriptions.

This also leads me to think that this question is not at all about observables as a general construct, but about the observable shim with the platform, e.g. when. This might have a different behavior for each type of shim and I don't know how to generalize this to the observable constructor.

The idea of passive event listening doesn't apply outside of Observables returned by EventTarget, just like AddEventListenerOptions#passive doesn't mean anything for "all callbacks" on the web — it only applies for callbacks insofar as they're event listeners. So I think we can change the default passive-ness for Observable-added event listeners, without messing with other Observable internals.

noamr commented 1 month ago

First off, I think setting https://wicg.github.io/observable/#ref-for-event-listener-passive to true by default is probably a good idea, assuming we'd want to do that with addEventListener() if we were designing it from scratch today. I'm not sure that's the case though — don't cancelable default-preventable events have a legitimate use case that's accepted? Or are these seen as legacy concepts that we've vowed to not add anymore, and that we'd do away with if we could do it all over again? If passive is a better default but we don't want to totally discourage people from active listening, we could make things passive by default but still let users override https://wicg.github.io/observable/#dom-observableeventlisteneroptions-passive to false, so those users can still use Observables.

Oh I wasn't aware of being able to pass passive to when.

Second, I think that's all separate from the issue of Observables having a ref-counted producer and being cold/lazy. We are not making Observable producers ref-counted for performance reasons a la passive listening nearly as much as we're doing it for predictability upon multiple subscriptions.

Alrighty.

benlesh commented 1 month ago

don't cancelable default-preventable events have a legitimate use case that's accepted?

Off the top of my head, I feel like the most common case for this is form submit events. Where the developer wants to handle the form submissions manually with fetch or over a socket or something. They need to prevent the default behavior. "Cancellable" in this case, would mean someone wants to unmount a component containing a form, but still have a subscription to the observable they got from when().

annevk commented 1 month ago

Yeah I'm not sure we should invert passive here.