getLayoutMap() is unavailable in sub-context web applications due to privacy mitigations

maymen1 commented 3 years ago

The privacy mitigations state:

As a first line of defense for the user, this specification requires that the API is only available from secure contexts and can only be called from the currently active top-level browsing context.

I'm a developer in the Excel Online application, working specifically on keyboard shortcuts. Our application is hosted as a sub-context inside of an iframe element in the dome, where the top-level context can be many different endpoints - SharePoint, OneDrive, Teams, etc. That means the API is not available for us to use from inside the app code.

We're interested in using this API in order to handle Excel keyboard shortcuts for non-QWERTY keyboard layouts.

Currently we're using the KeyboardEvent.keyCode property, but there are some keys in non-QWERTY keyboard layouts that keyCode doesn't seem to handle correctly, and getLayoutMap() should resolve. In addition to that, keyCode is deprecated, and there doesn't seem to be any alternative API to handle keyboard shortcuts for different layouts, other than getLayoutMap().

Should the keyCode property actually be removed one day - we believe all Office Online applications (PowerPoint, Word, OneNote, etc.) will require this or similar API to handle their keyboard shortcuts for different layouts.

We'd like to suggest two possible solutions -

Soften the privacy mitigations so that we can use this API from the sub-context of our application.
Create an additional API that only exposes the key value of the specific key pressed by the user, upon a key press event. This way the "fingerprint" is not so large for the user, since the API doesn't expose the entire keyboard layout, only the key that was pressed.

We'd be happy to start a discussion around this issue and find the best solution.

garykac commented 3 years ago

Re: keyCode Why are you using keyCode? It was never properly specified and is implemented inconsistently across browsers and platforms. Relying on it seems like it would result in lots of edge-case failures that will never be fixed. Using the key and code attributes is the recommended way for handling key presses. Can you provide more detail about the scenario where these attributes don't meet your needs?

Re: top-level context for API I think relaxing the mitigations will be difficult to get agreement on, although would be useful to discuss this further if the key and code attributes don't meet your needs.

BoCupp-Microsoft commented 3 years ago

Using the key and code attributes is the recommended way for handling key presses. Can you provide more detail about the scenario where these attributes don't meet your needs?

key and code are not enough to implement keyboard shortcuts across keyboard layouts. In at least Windows, keyboard shortcuts are based on VKEYS, which are a keyboard-layout dependent translation of the keyboard's scancodes. That means whether you have a French or US keyboard layout, you still type CTRL+A to select All. The current code attribute, however, reports KeyA when the "Q" key is pressed on a French keyboard. That works well for games that want a to identify keys based on a physical position (like WASD navigation), but doesn't work well for shortcuts.

getLayoutMap() used in conjunction with code solves the problem, but since getLayoutMap() isn't available in all contexts (can't be used in iframes and implementations are missing in other major browsers), keyCode is the next best fallback. Even in Chromium-based browsers the iframe restriction makes getLayoutMap() a problem since you can see that Office web apps like Excel, Word, PowerPoint, etc. show up as embedded experiences in Outlook Web, Teams, etc. and are running in iframes.

I can't control how quickly other implementors pick up getLayoutMap() beyond just advertising its importance to authors in issues like this, but regarding @maymen1's suggestion to soften privacy requirements for the sake of iframe availability, I was thinking what's the harm in allowing an iframe that receives keyboard events to see the layout map? We could maybe say that the focused document can call getLayoutMap() since it is already the target of keyboard events. Slightly less usable but more conservative would be a policy that you can only call getLayoutMap() after the document (or one of its descendant nodes) has been the target of a keyboard event. Since the event's mismatched key and code properties already reveal information about the keyboard layout the user is using, I don't see the harm in letting that iframe call getLayoutMap().

Let me know if you agree with any of that thinking. Thanks!

maymen1 commented 3 years ago

RE:

Using the key and code attributes is the recommended way for handling key presses. Can you provide more detail about the scenario where these attributes don't meet your needs?

I'd like to clarify why key will not help here - key does take into account both layout and locale, but the problem is it can generate any Unicode character. When handling keyboard shortcuts what you need is the Latin character pressed, or the closest "guess" to the user's intention in case their current locale & layout are generating non-Latin characters.

For example, if I have both French and Japanese layouts installed, and currently the active layout and locale are Japanese, I still want to receive the key values of the French layout upon a key press, since it is the highest priority layout that generates Latin characters.

getLayoutMap() does just that - it will only return Latin key values, based on the highest priority ASCII capable layout.

From the KeyboardEvent.keyCode documentation, it looks like keyCode is doing something very similar to that:

Google Chrome, Chromium and Safari must decide the value from the input character. If the inputting character can be inputted with the US keyboard layout, they use the keyCode value on the US keyboard layout.

garykac commented 3 years ago

getKeyboardLayout sounds like it does exactly what you need, and we should discuss making it available in more contexts.

Are your uses within a PWA? If so, that might be an avenue for having more permissive access to this API.

garykac commented 3 years ago

Actually, I think that a Permission Policy would work here.

Something similar to what was done for WebHID: https://wicg.github.io/webhid/#permissions-policy

The default would be "self" (= top-level browsing context) but sites could have something like:

<iframe allow="keyboard-map"> ... </iframe>

to allow the API in the iframe.

BoCupp-Microsoft commented 3 years ago

The Permission Policy sounds like a good solution to me. Let's see what @maymen1 thinks.

maymen1 commented 3 years ago

The PWA uses are a very small part of our use cases so a solution based on that won't work.

I think the Permission Policy might be sufficient, but would prefer to test it first if possible before we proceed with that solution, so we can know for sure it works.

garykac commented 3 years ago

Please review the new section which describes the Permission Policy.

This will allow pages to let subframes access this API by using <iframe allow="keyboard-map">.

BoCupp-Microsoft commented 3 years ago

Hi Gary,

I was expecting modifications in section 2.3.1.

Currently it has step 2 as:

If not currently executing in the currently active top-level browsing context, then

Reject p with an "InvalidStateError" DOMException.

I think you can replace that check with something like I see in the payment request API:

If the current settings object's responsible document is not allowed to use the "payment" permission, then throw a "SecurityError" DOMException.

Except it will need to be worded to reject the promise instead of throwing an exception.

Additionally, the new section you reference says:

This specification defines a feature that controls whether the getLayoutMap() method is exposed on the Keyboard interface.

Would it be more correct to say that it controls whether the author is allowed to use getLayoutMap since I don't think we'll change the interface definition based on the policy?

Otherwise LGTM.

garykac commented 3 years ago

The WebHid permission policy removes items from the interface if they are not allowed. So (AIUI) it's not unreasonable to do this.

However, you're right that it's not necessary in this case. The getLayoutMap() call can just reject the promise.

Thanks for pointing out the missing algorithm update. Spec updated in https://github.com/WICG/keyboard-map/commit/9237b2630b9cf408f6a8e7c938bd1f334f78079f

snianu commented 2 years ago

@garykac Thanks for making the spec changes. I can make the code changes in Chromium unless you're already working on it?

garykac commented 2 years ago

I am not currently working on it, so feel free to work on this.

For reference, see hid.cc for how HID implements this.

garykac commented 2 years ago

Greetings Redditors! https://www.reddit.com/r/programming/comments/rwqxay/google_chrome_97_introduces_controversial/

WICG / keyboard-map

getLayoutMap() is unavailable in sub-context web applications due to privacy mitigations #38