Existing intervention: user gesture required for sensitive operations

RByers commented 8 years ago

Chromium has a notion of a "user gesture" which indicates that we believe the user is explicitly interacting with the page (eg. mouse click, but not mouse move or wheel). Then certain sensitive operations are restricted to apply only if it can "consume" a user gesture (eg. one successful window.open call per mousedown/up/click sequence). Some of this maps to the pop-up blocking algorithm in the HTML5 spec. But I'm not sure how well spec'd the details are, what tests exist, and how much interoperability there is between browsers on this. Perhaps we should try to expand this issue with references / details?

Here's a (mostly complete) list of the things that require a user gesture in chromium:

Allowing pop-ups
Going full screen (requestFullscreen)
Writing to the clipboard
Some scenarios of form submission and requestAutocomplete
PresentationRequest::start
Enabling mouse lock
Various similar operations inside of plugins (fullscreen, mouse lock, etc.)
Buffering aggressively when media is paused, and potentially auto-playing media
‘color’ and ’file’ input types responding to an activation event
Showing an IME (eg. on screen keyboard) on element focus (permitted anytime after a user gesture has occurred since page load)
WebBluetooth and WebUSB requestDevice (experimental)

toddreifsteck commented 8 years ago

Microsoft Edge has found that the "user input flag" is flowing to some types of callbacks on mobile which was causing significant interop issues due to the lack of a public spec and agreement.

(My personal theory is that the history was to make video autoplay "work" for libraries originally built/tested on desktop, but I'll defer to Chrome experts who will be more familiar with the history.)

We’ve observed it flowing through all of the following in Chrome on Android:

setTimeout
setInterval (the 1st interval, but not any future intervals)
window.postMessage

We have observed it does not flow for:

Promises
RAF

Microsoft Edge's position is that the user flag should either flow to all callbacks OR should be blocked for all callbacks.

We are actively implementing a fix in Edge 14 in internal builds to flow the user input flag to setTimeout/setInterval/setImmediate to unblock a few sites that have issues

RByers commented 8 years ago

Interesting, thanks! Can you give us some data on which sites are affected by this? If Edge has never needed this before, then perhaps it's not worth the complexity and Chrome should just change to be simpler too?

What about for pop-up blocking - do you use a similar algorithm? Does it flow across setTimeout?

jeisinger commented 8 years ago

Sadly, the "User gesture" concept is not well defined. In WebKit and Blink, we implemented forwarding of the "user gesture" state to the first level of setTimeout calls with a 1s timeout, i.e. if a setTimeout handler invokes setTimeout again, the user gesture won't be forwarded twice, and if the timeout is >1s it won't be forwarded either.

We don't always forward the user gesture via postMessage - it is not forwaded across processes.

I agree that promises could forward the gestures, but why RAF?

What about stuff like XHR events (or IDB events etc.)

In general, the user gesture thing is a bit tricky to handle, as it has this 1s timeout, so if your XHR doesn't come back in time, you'd have lost the gesture. Not exactly developer friendly :(

domenic commented 8 years ago

So the spec defines this currently: https://html.spec.whatwg.org/multipage/browsers.html#allowed-to-show-a-popup

I have filed two issues on the spec related:

Change the name to something more general: https://github.com/whatwg/html/pull/1357
The list of triggering events seems too small: https://github.com/whatwg/html/issues/1358

The latter in particular could use implementer feedback on whether the spec aligns with implementations or not.

jeisinger commented 8 years ago

Should we also spec that certain operations destroy a usr gesture (opening a window in chrome does that).

RByers commented 8 years ago

This is a good improvement, thanks @domenic!

There's definitely a variety of ways implementation doesn't match the spec here. I'd like Microsoft's (eg @toddreifsteck's) input so let's discuss those details here.

Yes the list of triggering events is too small (eg. should also contain keydown, mousedown), but it's more complex than that - there's not a 1:1 mapping from event to gesture. For example, on a mousedown mousemove* mouseup sequence we take a single UserGesture - so you can open exactly one pop-up from any of those listeners (not one pop-up per movement). What complexity is actually required here for web compat / good user experience is really hard to say - I'd look to Edge's experience (trying to be compatible with Chrome). If they've got examples where they have been successful with something simpler, I'd be open to trying to change Chrome to match.

RByers commented 8 years ago

As part of rationalizing this intervention, we should really also expose an API indicating whether a user gesture is currently in progress. Eg. @dvoytenko has a scenario in AMP that is really no different than the built-in browser scenarios - an untrusted iframe does a postMessage to the main document requesting an action they only want to do in response to a user actually interacting with the frame. I'd argue we should just expose some simple userActivationInProgress bit somewhere.

dvoytenko commented 8 years ago

Yes, our security model is that we typically allow more changes to an AMP document if we can confirm user action. For instance, we only allow iframes to resize themselves on user action. If we didn't, the page would jump and auto-risize itself without any constraints thus completely obliterating user experience. There are many other features that are only allowed on user action. Currently, we polyfill this functionality via focused state and soon we will also deploy polyfill based on clipboard. But these are not ideal.

greggman commented 6 years ago

I'm not sure where to bring this up but, speccing which gestures. How about the drag and drop events? There are pages that say "drop an mp3 here" and they'd like to load and play the sound the moment the mp3 is dropped.

domenic commented 2 years ago

It's amazing coming back to this repository and issue and recalling that at one time, our user activation concept was called "allowed to show a popup" and only applied to window.open()!

These days we have a well-defined concept of user activation. (Well, three-ish, actually: user activation consumption, transient user activation checking, and sticky user activation checking.) And it's used by pretty much everything Rick lists in the original post here, with the exception of showing the IME (not really specced anywhere) and some stuff that died (requestAutocomplete(), plugins). Big kudos to @mustaqahmed for all the work on that over the years.

So we'll close out this issue, as part of the larger project of archiving this repository (#72). As soon as I get write access to this repository.

WICG / interventions

Existing intervention: user gesture required for sensitive operations #12