WICG / close-watcher

A web API proposal for watching for close requests (e.g. Esc, Android back button, ...)
https://html.spec.whatwg.org/multipage/interaction.html#close-requests-and-close-watchers
72 stars 5 forks source link

Close Watchers

Read on for the full story, or check out the demo!

The problem

Various UI components have a "modal" or "popup" behavior. For example:

An important common feature of these components is that they are designed to be easy to close, with a uniform interaction mechanism for doing so. Typically, this is the Esc key on desktop platforms, and the back button on some mobile platforms (notably Android). Game consoles also tend to use a specific button as their "close/cancel/back" button. Finally, accessibility technology sometimes provides specific close requests for their users, e.g. iOS VoiceOver's "dismiss an alert or return to the previous screen" gesture.

We define a close request as a platform-mediated interaction that's intended to close an in-page component. This is distinct from page-mediated interactions, such as clicking on an "x" or "Done" button, or clicking on the backdrop outside of the modal. See our platform-specific notes for more details on what close requests look like on various platforms.

Currently, web developers have no good way to handle these close requests. This is especially problematic on Android devices, where the back button is the traditional close request. Imagine a user filling in a twenty-field form, with the last item being a custom date picker modal. The user might click the back button hoping to close the date picker, like they would in a native app. But instead, the back button navigates the web page's history tree, likely closing the whole form and losing the filled information. But the problem of how to handle close requests extends across all operating systems; implementing a good experience for users of different browsers, platforms, and accessibility technologies requires a lot of user agent sniffing today.

This explainer proposes a new API to enable web developers, especially component authors, to better handle these close requests.

Goals

Our primary goals are as follows:

The following is a goal we wish we could meet, but don't believe is possible to meet while also achieving our primary goals:

What developers are doing today

On desktop platforms, this problem is currently solved by listening for the keydown event, and closing the component when the Esc key is pressed. Built-in platform APIs, such as <dialog>, fullscreen, or <input type="date">, will do this automatically. Note that this can be easy to get wrong, e.g. by accidentally listening for keyup or keypress.

On mobile platforms, getting the right behavior is significantly harder. First, platform-specific code is required, to do nothing on iOS, capture the back button or swipe-from-the-sides gesture on Android, and listen for gamepad inputs on PlayStation browser.

Next, capturing the back button on Android requires manipulating the history list using the history API. This is a poor fit for several reasons:

Proposal

The proposal is to introduce a new API, the CloseWatcher class, which has the following basic API:

const watcher = new CloseWatcher();

// This fires when the user sends a close request, e.g. by pressing Esc on
// desktop or by pressing Android's back button.
watcher.onclose = () => {
  myModal.close();
};

// You should destroy watchers which are no longer needed, e.g. if the
// modal closes normally. This will prevent future events on this watcher.
myModalCloseButton.onclick = () => {
  watcher.destroy();
  myModal.close();
};

If more than one CloseWatcher is active at a given time, then only the most-recently-constructed one gets events delivered to it. (Usually.) A watcher becomes inactive after a close event is delivered, or the watcher is explicitly destroy()ed.

Read on for more details and realistic usage examples.

Requesting close yourself

The API has an additional convenience method, watcher.requestClose(), which acts as if a close request had been sent by the user. The intended use case is to allow centralizing close-handling code. So the above example of

watcher.onclose = () => myModal.close();

myModalCloseButton.onclick = () => {
  watcher.destroy();
  myModal.close();
};

could be replaced by

watcher.onclose = () => myModal.close();

myModalCloseButton.onclick = () => watcher.requestClose();

deduplicating the myModal.close() call by having the developer put all their close-handling logic into the watcher's close event handler.

As usual, reaching the close event will inactivate the CloseWatcher, meaning it receives no further events in the future and it no longer occupies the "free CloseWatcher slot", if it was previously doing so.

Asking for confirmation

There's a more advanced part of this API, which allows asking the user for confirmation if they really want to close the modal. This is useful in cases where, e.g., the modal contains a form with unsaved data. The usage looks like the following:

watcher.oncancel = async (e) => {
  if (hasUnsavedData) {
    e.preventDefault();

    const userReallyWantsToClose = await askForConfirmation("Are you sure you want to close this dialog?");
    if (userReallyWantsToClose) {
      hasUnsavedData = false;
      watcher.close();
    }
  }
};

Note: the name of this event, i.e. cancel instead of something like beforeclose, is chosen to match <dialog>, which has the same two-tier cancel + close event sequence.

For abuse prevention purposes, this event only fires if the page has received transient user activation. Furthermore, once it fires for one CloseWatcher instance, it will not fire again for any CloseWatcher instances until the page gets user activation again. This ensures that if the user sends a close request twice in a row without any intervening user activation, the request definitely goes through, destroying the CloseWatcher.

Note that the cancel event is not fired when the user navigates away from the page: i.e., it has no overlap with beforeunload. beforeunload remains the best way to confirm a page unload, with cancel only used for confirming a close request.

If called from within transient user activation, watcher.requestClose() also invokes cancel event handlers, which would trigger event listeners like the above example code. If called without user activation, then it skips straight to the close event.

User activation grouping

In addition to the user activation restrictions for cancel events, mentioned above, there is a more subtle form of user activation gating for CloseWatcher construction. The basic idea is that if you have created more than one CloseWatcher without user activation, then the newly-created one will get grouped together with the most-recently-created close watcher, so that a single close request will close them both. This is meant to prevent abuse.

So for example:

window.onload = () => {
  // This will work as normal: it is the first close watcher created without user activation.
  (new CloseWatcher()).onclose = () => { /* ... */ };
};

button1.onclick = () => {
  // This will work as normal: the button click counts as user activation.
  (new CloseWatcher()).onclose = () => { /* ... */ };
};

button2.onclick = () => {
  // These will be grouped together, and both will close in response to a singe close request.
  (new CloseWatcher()).onclose = () => { /* ... */ };
  (new CloseWatcher()).onclose = () => { /* ... */ };
};

Note that for developers, this means that calling watcher.destroy() properly is important. Doing so is the only way to get back the "free" ungrouped close watcher slot, that allows you to create an ungrouped CloseWatcher even without user activation. Such CloseWatchers are useful for cases like session inactivity timeout dialogs, or urgent notifications of server-triggered events, that want to be closable with a close request but are not created via user activation.

Using AbortSignals to destroy CloseWatchers

As discussed above, destroying a CloseWatcher using watcher.destroy() is helpful to avoid any future events and free up the free CloseWatcher slot.

Another way to destroy CloseWatchers is by using AbortSignal objects, like so:

const controller = new AbortController();
const watcher = new CloseWatcher({ signal: controller.signal });

// ... later ...
controller.abort();

If the AbortSignal is only being used for the CloseWatcher, this is not that helpful. But it works nicely when you are using the AbortSignal to abort many ongoing operations or dispose of many different resources.

Abuse analysis

As discussed above, we have various forms of user activation gating. These are meant to prevent abuse for platforms like Android where the close request is to use the back button. There, we need to prevent abuse that traps the user on a page by effectively disabling their back button.

In detail, a malicious page which wants to trap the user would be able to do at most the following using the CloseWatcher API:

Other variations are possible; e.g. if the user activates the page once, the abusive page could create two CloseWatchers instead of one, but then it wouldn't get cancel events, so three back button presses would still escape the abusive page.

In general, if the user activates the page N times, a maximally-abusive page could make it take N + 2 back button presses to escape.

Compare this to the protection in place today for the history.pushState() API, which is another means by which apps can attempt to trap the user on the page by making their session history list grow. In the spec, there is an optional step that allows the user agent to ignore these method calls; in practice, this is only done as a throttling measure to avoid hundreds of calls per second overwhelming the history state storage implementations.

Another mitigation that browsers implement against history.pushState()-based trapping is to try to have the actual back button UI skip entries that were added without user activation, i.e., to have it behave differently from history.back() (which will not skip such entries). Such attempts are a bit buggy, but in theory they woud mean an abusive page that is activated N times would require N + 1 back button presses to escape.

We believe that the additional capabilities allowed here are worth expanding this number from N + 1 to N + 2, especially given how unevenly these mitigations are currently implemented and how that hasn't led to significant user complaints. However, there are a number of alternatives discussed below which would allow us to lower this number:

Note that no confirmation dialogs would not reduce the number, since in our current design the user activations are shared between CloseWatcher creation and the cancel event, so removing just the cancel event does not change the calculus.

For now we are sticking with the current proposal and its N + 2 number, but welcome discussion from the community as to whether this is the right tradeoff.

Finally, we note that in most browser UIs, the user has an escape hatch of holding down the back button and explicitly choosing a history step to navigate back to. Or, closing the tab entirely. These are never a close request.

Realistic examples

The above sections give illustrative usage of the API. The following ones show how the API could be incorporated into realistic apps and UI components.

A sidebar

For a sidebar (e.g. behind a hamburger menu), which wants to hide itself on a user-provided close request, that could be hooked up as follows:

const hamburgerMenuButton = document.querySelector('#hamburger-menu-button');
const sidebar = document.querySelector('#sidebar');

hamburgerMenuButton.addEventListener('click', () => {
  const watcher = new CloseWatcher();

  sidebar.animate([{ transform: 'translateX(-200px)' }, { transform: 'translateX(0)' }]);

  watcher.onclose = () => {
    sidebar.animate([{ transform: 'translateX(0)' }, { transform: 'translateX(-200px)' }]);
  };

  // Close on clicks outside the sidebar.
  document.body.addEventListener('click', e => {
    if (e.target.closest('#sidebar') === null) {
      watcher.close();
    }
  });
});

Note that it never really makes sense to use the cancel event for a sidebar.

A picker

For a "picker" control that wants to close itself on a user-provided close request, code like the following would work:

class MyPicker extends HTMLElement {
  #button;
  #overlay;
  #watcher;

  constructor() {
    super();
    this.#button = /* ... */;

    this.#overlay = /* ... */;
    this.#overlay.hidden = true;
    this.#overlay.querySelector('.close-button').addEventListener('click', () => {
      this.#watcher.requestClose();
    });

    this.#button.onclick = () => {
      this.overlay.hidden = false;

      this.#watcher = new CloseWatcher();
      this.#watcher.onclose = () => this.overlay.hidden = true;
    }
  }
}

Similarly, picker UIs do not usually require confirmation on closing, so do not need cancel event handlers.

Platform close requests

With CloseWatcher as a foundation, we can work to unify the web platform's existing and upcoming close requests:

Explaining <dialog>

The <dialog> spec today states that "user agents may provide a user interface that, upon activation, [cancels the dialog]". In particular, here canceling the dialog first fires a cancel event, and if the web developer does not call event.preventDefault(), it will close the dialog.

The existing <dialog> implementation in Chromium implements this, but only with with Esc key on desktop. That is, on Android Chromium, the system back button will not close <dialog>s. (Perhaps this is because of the fears about back button trapping mentioned above?)

Our proposal is to replace the vague specification sentence above with text based on close watchers. This has a number of benefits:

Note that the user activation "resource" is shared between everything that uses the general close watcher infrastructure: i.e., both <dialog> elements, and CloseWatchers. So, for example, creating a <dialog> without user activation uses up the free close watcher slot, so that if you then proceed to create another <dialog> and construct a CloseWatcher, all without user activation, that CloseWatcher and the second <dialog> will be grouped together and both be closed by a single close request.

Integration with Fullscreen

The Fullscreen spec today states "If the end user instructs the user agent to end a fullscreen session initiated via requestFullscreen(), fully exit fullscreen". Existing Fullscreen implementations implement this using the Esc key on desktop, the back button on Android, and a floating software "x" button on iPadOS. (iOS on iPhones does not appear to implement the fullscreen API.)

We propose replacing this with explicit integration into the close request steps. Again, this gives interoperability benefits by using a shared primitive, and a clear specification for how it interacts with <dialog>s, CloseWatchers, and key events.

Integration with popover=""

The popover="" attribute can benefit from similar integration as <dialog>. See our proposed spec text.

Integration with <input>?

We could update the specification for <input> to mention that it should use close watchers when the user opens the input's picker UI. This kind of unification fits well with the goals of this project, but it might be tricky since the existing <input> specification is intentionally vague on how UI is presented.

Alternatives considered

Integration with the history API

Because of how the back button on Android is a close request, one might think it natural to make handling close requests part of either the existing history API, or a revised history API. This idea has some precedent in mobile application frameworks that integrate modals into their "back stack".

However, on the web the history API is intimately tied to navigations, the URL bar, and application state. Using it for UI state is generally not great. See also the above discussion of how developers are forced to use the history API today for this purpose, and how poorly it works. In fact, we're hopeful that by tackling this proposal separately from the history API, other efforts to improve the history API will be able to focus on actual navigations, instead of on close requests.

Note that the line between "UI state" and "a navigation" can be blurry in single-page applications. For example, Twitter.com's logged-in view lets you type directly into a "What's happening?" text box in order to tweet, which we classify as UI state. But if you click the "Tweet" button on the sidebar, it navigates to a new URL which displays a lightbox into which you can input your tweet. In our taxonomy, this new-URL lightbox is a navigation, and it would not be suitable to use the CloseWatcher API for it, because closing it needs to update the URL back to what it was originally (i.e., navigate backwards in the history list).

Automatically translating all close requests to Esc

If we assume that developers already know to handle the Esc key to close their components, then we could potentially translate other close requests, like the Android back button, into Esc key presses. The hope is then that application and component developers wouldn't have to update their code at all: if they're doing the right thing for that common desktop close request, they would suddenly start doing the right thing on other platforms as well. This is especially attractive as it could help avoid the awkward transition period mentioned in the goals section.

However, upon reflection, such a solution doesn't really solve the general problem. Given an Android back button press, or a PlayStation circle button press, or any other gesture which might serve multiple context-dependent purposes, the browser needs to know: should perform its usual action, or should it be translated to an Esc key press? For custom components, the only way to know is for the web developer to tell the browser that a close-request-consuming component is open. So our goal of requiring no code modifications, or awkward transition period, is impossible. We'd still need some API, a counterpart to our new CloseWatcher(), which tells the browser that the next close request should be turned into an Esc keypress. Given this, the strangeness of synthesizing fake Esc key presses in response to some other setup API does not have much to recommend it.

A single event

Why do we need the CloseWatcher class? Why couldn't we just fire a global close event or similar, which abstracts over platform differences in close requests?

The problem here is similar to the previous idea of translating all close requests into an Esc key press. On some platforms, a close request has an important fallback behavior. (Notably, on Android, where the fallback behavior is to perform a history navigation.) This means developers need some way of signaling to the browser that the next instance of such a close request should be directed to their code, instead of performing that fallback action. And, we need to gate the ability to redirect close requests in that way behind user activation.

A less fundamental benefit is for developer ergonomics. By having essentially one event per close watcher, we get a kind of stack for free. For example, if various modal or popup components (including <dialog> elements) use CloseWatchers, then we can be sure to always route the event to the "topmost" (i.e., most-recently-created) modal. If we had just one global event, then such components would need to coordinate with each other to ensure that only the topmost acts on the event, and the others ignore it. In other words, the stack of CloseWatcher instances, which is pushed onto by the constructor and popped off of by CloseWatcher destruction, is a nice bonus for web developers. Our experience implementing the close watcher concept in Chromium, and sharing the infrastructure between CloseWatcher, <dialog>, and popover="", supports this ergonomic argument; the close watcher stack infrastructure is somewhat complex, but can be nicely encapsulated away from its usage sites.

Browser-mediated confirmation dialogs

Previous iterations of this proposal had a different semantic for the cancel event, where calling event.preventDefault() would show non-configurable browser UI asking to confirm closing the modal.

The benefit of this approach is that, because the browser directly gets a signal from the user when the user says "Yes, close anyway", we can reduce the number of Android back button clicks that abusive pages can trap from N + 2 to N + 1. However, on balance this isn't actually a big win, because the user still has to perform N + 2 actions to escape: N + 1 back button presses, and one "Yes, close anyway" press.

Additionally, some early feedback we got was that custom in-page confirmation UI was very desirable for web developers, instead of non-configurable browser UI.

No confirmation dialogs

It's also possible to start with a version of this proposal that does not fire a cancel event at all. This means that pressing the Android back button will always destroy the close watcher.

However, by itself this this doesn't change what it takes to escape an an abusive page: it still takes N + 2 Android back button presses, since a maximally-abusive page will just take advantage of user activation to create new CloseWatcher instances, instead of bothering with cancel events.

In talking with web developers, this variant of the proposal was not as preferable:

No free close watcher

As discussed above, this proposal gives pages the ability to create one "free" CloseWatcher, which doesn't get grouped with others, even without user activation. This is meant for cases like session inactivity timeout dialogs or urgent notifications of server-triggered events that want a presentation that is closable with a close request. However, it comes with a cost in terms of allowing abusive pages to add 1 to the number of Android back button presses necessary to escape.

We could remove this ability from the proposal, always grouping together CloseWatchers created without user activation. This would reduce the number of Android back button presses by 1, from N + 2 to N + 1.

Bundling this with high-level APIs

The proposal here exposes CloseWatcher as a primitive. However, watching for close requests is only a small part of what makes UI components difficult. Some UI components that watch for close requests also need to deal with top layer interaction, blocking interaction with the main document (including trapping focus within the component while it is open), providing appropriate accessibility semantics, and determining the appropriate screen position.

Instead of providing the individual building blocks for all of these pieces, it may be better to bundle them together into high-level semantic elements. We already have <dialog>; we could imagine others such as <toast>, <tooltip>, <sidebar>, etc. These would then bundle the appropriate features, e.g. while all of them would benefit from top layer interaction, <toast> and <tooltip> do not need to handle close requests.

Our current thinking is that we should produce both paths: we should work on bundled high-level APIs, such as the existing <dialog>, but we should also work on lower-level components, such as close requests, popover="", top layer management, or focus trapping. And we should, as this document tries to do, ensure these build on top of each other. This gives authors more flexibility for creating their own novel components, without waiting to convince implementers of the value of baking their high-level component into the platform, while still providing an evolutionary path for evolving the built-in control set over time.

Security and privacy considerations

Much of these issues were discussed above. See the W3C TAG Security and Privacy Questionnaire answers for more. To summarize:

Security considerations

The main security consideration with this API is preventing abusive pages from hijacking the fallback behavior in the last part of the close request steps. A concrete example is on Android, where the close request is the software back button, and this fallback behavior is to traverse the history by a delta of −1. If developers could always intercept Android back button presses via CloseWatcher instances and <dialog> elements, then they could effectively break the back button by never letting it pass through to the fallback behavior.

Much of the complexity of this specification is designed around preventing such abuse. Without it, the API could consist of a single event. (Although that would make it ergonomically difficult for developers to coordinate on which component of their application should handle the close request.) But with this constraint, we need an API surface such as the new CloseWatcher() constructor which can note whether transient activation was present at construction time, as well as the close watcher stack to ensure that we remove at least one close watcher per close request.

Concretely, the mechanism of creating a close watcher ensures that web developers can only create CloseWatcher instances, or call preventDefault() on cancel events, by attempting to consume user activation. If a CloseWatcher instance is created without transient activation, then after one "free" close watcher, any such close watcher is grouped together with the currently-top close watcher in the stack, so that both of them are closed by a single close request. This gives similar protections to what browsers have in place today, where back button UI skips entries that were added without user activation. Similarly, if there hasn't been any transient activation, the cancel event is not fired.

We do allow one "free" ungrouped CloseWatcher to be created, even without transient activation, to handle cases like session inactivity timeout dialogs, or urgent notifications of server-triggered events. The end result is that this specification expands the number of Android back button presses that a maximally-abusive page could require to escape from number of user activations + 1 to number of user activations + 2. (See above for a full analysis.) We believe this tradeoff is worthwhile.

Privacy considerations

We believe the privacy impact of this API is minimal. The only information it gives about the user to the web developer is that a close request has occurred, which is a very infrequent and coarse piece of user input.

In all cases we're aware of today, such close requests are already detectable by web developers (e.g., by using keydown listeners on desktop or popstate listeners on Android). In theory, by correlating these existing events with the CloseWatcher's close event, a web developer could determine some information about the platform. (I.e., if they correlate with keydown events, the user is likely on desktop, or at least on a keyboard-attached mobile device.) This is similar to existing techniques which detect whether touch events or mouse events are fired, and user agents which want to emulate a different platform in order to mask the user's choice might want to apply similar mitigation techniques for close watchers as they do for other platform-revealing events.

Stakeholder feedback

Acknowledgments

This proposal is based on an earlier analysis by @dvoytenko.