Questions on the explainer.

MicrosoftEdge / MSEdgeExplainers

Home for explainer documents originated by the Microsoft Edge team

Creative Commons Attribution 4.0 International

1.3k stars 207 forks source link

Questions on the explainer. #385

Open Westbrook opened 4 years ago

Westbrook commented 4 years ago

Very cool start! Excited for this conversation to move forward, and have lots of questions:

Does the following code in the explainer imply that the EyeDropper wouldn't have its own UI when "closed"?

// Enter eyedropper mode
let icon = document.getElementbyId("eyeDropperIcon")
icon.addEventListener('click', e => {
eyeDropper.open().then(
    () => { console.log("entered eyedropper mode") },
    () => { console.log("could not enter eyedropper mode") }
)
});

While you make a great point about previous art leaning away from the use of an extended <input> element, would you still see this as being an element (there's actually no reference to it actually being an element but for argument's sake) that would be "form associated"? Or would this be something more "over the top" where the resultant e.value would be passed into a input[type="color"] element, or similar, to support this?
The element or not question is also pertinent in regards to no UI events except colorselect are dispatched to the web page which implies that the event is triggered on window or document while the example listens on eyeDropper.addEventListener which without being an element would need to be an EventTarget, which I'm not sure actually bubbles its events to window et al.
By the statement the colorselect event is only dispatched while the primary pointer is down, do you see the colorselect event as streaming while the pointer is down? Would you pair this with a final event (colorchange, etc) when the pointer is "up" again? Or, as the example implies, would it be for the developer to "decide" when the user is done, possibly through only caring about the first event? If we needed the paired event names, would it make sense to fall back to prior art here and match an input event with a change event for regularity across features?
Returning to while the primary pointer is down, how would you expect this API to interact with keyboard only use cases?
Would this be a purely JS space feature, or do you see there being an HTML first approach to this?
Whether there is "closed UI" or not, there is "opened UI" and that being so what sort of customization pattern do you see being best to deliver with this sort of functionality? More specifically, the reason similarly complex UI/X patterns seem to be repeated often in userspace (see the <select> pattern) is the overly restrictive visual delivery of said patterns, how could this proposal head that issue off at the pass? The possibilities of a completely "headless" version of this experience would be quite interesting.
While I appreciate the complexity of submitting a color object model along with this proposal, do you see any short comings around only supporting hex colors? The specifically available color space of hex colors might be too restrictive. Could you see this supporting a type="rgb|hex|hsl|etc" property, or multiple types by default (see the Chrome DevTools colors selectors) to allow for a more expressive color capture mechanism?
On the subject of color, in the browser space, there is some context of alpha that could be applied to this picker. Do you see this as being a purely WYSIWYG result (so the fully composited color) only, or is there room for more complex resolution? Maybe not in a v1 but I'd be interested in seeing a possible path to doing so.
I'll share thoughts on scoping the parts of the page that the EyeDropper has access to in #382.

BoCupp-Microsoft commented 4 years ago

Does the following code in the explainer imply that the EyeDropper wouldn't have its own UI when "closed"?

Correct. The EyeDropper would not have UI when closed.

While you make a great point about previous art leaning away from the use of an extended element, would you still see this as being an element (there's actually no reference to it actually being an element but for argument's sake) that would be "form associated"? Or would this be something more "over the top" where the resultant e.value would be passed into a input[type="color"] element, or similar, to support this?

As currently proposed there's no element and therefore no form association. After obtaining a sampled color value, if you wanted it to be submitted as part of a form, you could stuff the value into an input element of your choosing or make a custom element and use ElementInternals to set a form value to be submitted.

By the statement the colorselect event is only dispatched while the primary pointer is down, do you see the colorselect event as streaming while the pointer is down?

I think streaming while the pointer is down is the most capable implementation, but I also think that different user agents could satisfy the contract of the API if they chose instead to exit "eyedropper mode" after the first click. To the author such an implementation would appear like the user closed "eyedropper mode" by hitting ESC after sampling a single color.

Another thing that is informative to the discussion on this point, in Mac OS Catalina, the API that we were using to sample colors was recently guarded by an OS-level permission prompt, and a new API, NSColorSampler, provides an new OS-implemented eyedropper experience. That new eyedropper experience dismisses after the first click and only samples one color. So right now on Mac OS we have two different implementations: one that is capable of streaming and another that isn't. Maybe that will force our hand, but ignoring that for a moment let me answer some of the other questions you asked about streaming.

Would you pair this with a final event (colorchange, etc) when the pointer is "up" again?

I don't think that's needed. You would get a colorselect event at a similar frequency to pointermove events while the pointer is down. The author should deal with each colorselect event in some app-specific way, e.g. by updating the current foreground color for painting. I think its only necessary to notify the author when "eyedropper mode" is exited, e.g. when the user presses the ESC key.

Or, as the example implies, would it be for the developer to "decide" when the user is done, possibly through only caring about the first event?

The author can also decide when to exit "eyedropper mode" and could choose to allow selection of only one color. As an alternative, the author could implement a model where the user needs to select a different tool from a tool palette to exit "eyedropper mode". To enable that scenario I opened a related issue here which would enable an author to designate regions of the document where UI events can be delivered instead of being eaten by the eyedropper window.

If we needed the paired event names, would it make sense to fall back to prior art here and match an input event with a change event for regularity across features?

I don't think we need another event besides colorselect and close to implement the scenarios I've described. Let me know if you think differently.

Returning to while the primary pointer is down, how would you expect this API to interact with keyboard only use cases?

Assuming the arrow keys move the eyedropper and the ENTER key or the SPACE key sample a color, I think you can repeatedly select colors by repeatedly pressing the ENTER or SPACE key. You wouldn't exit "eyedropper mode" until you hit ESC or until you select a tool other than the eyedropper from an author-provided tool palette.

Would this be a purely JS space feature, or do you see there being an HTML first approach to this?

As currently proposed it is purely a JS feature where the author is meant to code up a custom UI of their own to use the sampled colors. Note there is an HTML first approach proposed here. I also documented some rationale that led us to the JS-based approach here. Feedback welcome on why you might prefer one over the other.

BoCupp-Microsoft commented 4 years ago

Whether there is "closed UI" or not, there is "opened UI" and that being so what sort of customization pattern do you see being best to deliver with this sort of functionality? More specifically, the reason similarly complex UI/X patterns seem to be repeated often in userspace (see the select pattern) is the overly restrictive visual delivery of said patterns, how could this proposal head that issue off at the pass? The possibilities of a completely "headless" version of this experience would be quite interesting.

I'm familiar with the issues you're referring to about authors recreating controls, specifically select. My team is also working on Control UI Customization .

I would call the EyeDropper API as "headless" as possible. There's no UI on the document canvas for the eyedropper, but we do create a custom cursor that shows a magnified view of what it is currently over. Some customization could be possible, e.g. we could let an author choose to show a custom cursor instead and not have a magnified window or maybe control the aperture size or the level of magnification. One complication I referenced above though is that the eyedropper UI on Mac OS Catalina is actually OS supplied, so our customization opportunities seem non-existent if we continue using it to avoid the OS permission prompt.

If you had the ability to customize it, what options would you say are most important to control?

While I appreciate the complexity of submitting a color object model along with this proposal, do you see any short comings around only supporting hex colors? The specifically available color space of hex colors might be too restrictive. Could you see this supporting a type="rgb|hex|hsl|etc" property, or multiple types by default (see the Chrome DevTools colors selectors) to allow for a more expressive color capture mechanism?

I think there's room to extend what colorselect returns. Returning a hex value for now seemed like a reasonable way to decouple developing an eyedropper from developing a new Color interface.

On the subject of color, in the browser space, there is some context of alpha that could be applied to this picker. Do you see this as being a purely WYSIWYG result (so the fully composited color) only, or is there room for more complex resolution? Maybe not in a v1 but I'd be interested in seeing a possible path to doing so.

We're currently proposing the composite color value only without alpha. If authors want to offer something more sophisticated when the sampled color is from their own document, they could use the client coordinates of the colorselect event to lookup data their application might have. The second paragraph in the solution section of the explainer mentions a couple of scenarios we imagined.

I'll share thoughts on scoping the parts of the page that the EyeDropper has access to in #382.

Thanks! Looking forward to it.

ericlaw1979 commented 3 years ago

I think streaming while the pointer is down is the most capable implementation,

Streaming seems like a huge security hole if it enables gesture-jacking. An attacking website entices the user to hold down the mouse button, then the site shoves a cross-origin resource underneath the mouse pointer, moving it around as needed to scan it from left-to-right, top-to-bottom, stealing an image of the cross-origin resource in milliseconds.

domenic commented 3 years ago

stealing an image of the cross-origin resource in milliseconds.

Very interesting. Maybe this API should be gated behind cross-origin isolation, so that cross-origin resources can't even get into the same process?

BoCupp-Microsoft commented 3 years ago

@ericlaw1979 sorry for the slow reply.

Is gesture-jacking the new term for click-jacking?

I prefer the usability of streaming back colors as opposed to getting one color per click. FWIW Photoshop's color picker streams colors while the mouse is down, and IMO it works really well.

I think we can mitigate this attack in a couple of ways or maybe with a combination of these two things:

Have the screen capture we use to read back colors be static for the duration that the mouse is down.
Only generate color select events for the intial mouse down plus subsequent mouse moves.

I think either of those techniques would prevent an attacker from moving an iframe under the cursor to generate scanlines for the content.

Let me know if you agree.

@domenic I prefer to not put in cross-origin restrictions. I think it would be weird to allow selection outside of the browser window but restrict sampling a color from an iframe.

domenic commented 3 years ago

@domenic I prefer to not put in cross-origin restrictions. I think it would be weird to allow selection outside of the browser window but restrict sampling a color from an iframe.

That's not what cross-origin isolation means. See https://web.dev/coop-coep/

BoCupp-Microsoft commented 3 years ago

@domenic thanks for the link. I'm not connecting the dots on how the process an iframe runs in affects the user's ability to sample a color, or precludes an attacker from changing the position of an iframe to move its pixels under the cursor position. I think process isolation has no impact on these things. Maybe I'm thinking about it wrong. Could you tell me how the mitigation would work?

ericlaw1979 commented 3 years ago

@BoCupp-Microsoft - Yeah, "gesture-jacking" would be a super-set of clickjacking, encompassing user actions that aren't just a simple click (e.g. user holding down enter key, user dragging mousedown, etc).

Using a fixed image to pull colors from or updating the value only when the mouse moves would resolve the "scanner" attack, but it could result in an odd user-experience if the user were trying to pick a color from a video or other source where the screen contents are changing.

With regards to Dominic's proposal: He's saying that if you prevented embedding of a foreign (cross origin) iframe within your page, then you would not have the ability to move that victim content around. We'd presumably need to require the same restriction for a window.open()'d window to which the attacker window had a handle (e.g. window.open(...'noopener'). Such a restriction feels like a pretty big and user/dev-confusing hammer to deploy.

I'd prefer try to impose a limit on the picker API itself.

domenic commented 3 years ago

Sure. The idea is that @ericlaw1979 outlines an attack where you can steal the contents of a cross-origin image.

By requiring the web developer to use COOP+COEP headers, you ensure that the page only contains cross-origin data which has opted in to being embedded. (~~I think only COEP, specifically Cross-Origin-Embedder-Policy: require-corp, is important here, but they always come as a bundle these days.~~Nope, I see that COOP is also necessary to prevent the popup attack, as COOP severs the opener relationship) That is, it would be impossible to include a private cross-origin image in the first place. So the attack would be prevented.

Stated another way, if we required cross-origin isolation for the eyedropper, then the page would be in one of two states:

It could embed arbitrary cross-origin images, but it would have no access to the eyedropper API, so no way to steal their data
It could only embed cross-origin images which opted in to being embedded. And, it would have have access to the eyedropper API, but this is much less dangerous since it can only "steal" data which has opted in to being embedded.

There's a separate discussion about whether opting in to being embedded is the same as opting in to having your data read, but in these days, post-Spectre, they are very similar.

BoCupp-Microsoft commented 3 years ago

@domenic and @ericlaw1979 thanks for explaining how COOP + COEP would help this scenario. It seems heavier than the other mitigations I've proposed if those work.

@ericlaw1979 regarding this comment:

if the user were trying to pick a color from a video or other source where the screen contents are changing

I'm not sure that's a scenario we need to support. It seems suspect that users will want to select a color that isn't on the screen yet when they enter eyedropper mode and start holding down the mouse button. The user would just need to wait and click at the moment they see the color they want in the video.

Also, I was doing the math on the attack and if we assume that we only fire a color select event for the current location of the cursor on every frame, and we assume a 60Hz refresh rate, that means the attacker only gets 60 pixels each second. So if we didn't pursue any mitigation, then a small image could be scanned in seconds or minutes (not milliseconds), and if the attacker tried to scrape a larger region, for example whatever can be seen in a 1024x768 iframe, it would take hours of the user holding down the mouse button.

Let me know if you agree or if there's a faster way to scrape. Maybe mitigating the issue is less important if the user can see the attack happening and just needs to release the mouse to prevent it?

ericlaw1979 commented 3 years ago

@BoCupp-Microsoft: RE the 60hz limit, my original attack was the naïve/simple one. In a more realistic attack, an attacker could read a 10 digit security code or account number in a tiny number of pixel reads (probably something like 40?). Some privacy-impactful attacks can be executed in a single pixel read (e.g. the CSS Visited History attack, or a login detector based on whether the color at offset 2,2 is the color of a placeholder image or of the logged-in user's profile image).

The Web Platform folks have been extremely aggressive at protecting against even very obscure cross-origin oracles (e.g. https://bugs.chromium.org/p/chromium/issues/detail?id=1156999) so there's likely to be a ton of attention on this magical new API.

I'd still feel more confident about this feature if we released exactly one color per mouse-down. To help ensure that the user understands what color they are going to be picking, we could attach a loupe to the cursor:

BoCupp-Microsoft commented 3 years ago

@ericlaw1979 thanks for these attack examples. It may be the case that we initially ship with only one color being picked for other reasons like this one. My preference though is still to pursue streaming colors since there are more compelling experiences that can be built that are already visible in native apps today. Seems like we have ideas above that may mitigate this threat. I'll get the explainer updated to acknowledge the threat and record the mitigations.

domenic commented 3 years ago

I do want to urge you to consider requiring cross-origin isolation as a pretty good mitigation. COI is exactly designed for exposing powerful features by ensuring they can't be used for attacks like this. Really, it's kind of fortunate that we've come up with the whole COI framework in time for it to be available for this case. There are other mitigations, but COI is just such a good fit.

Over time, more and more powerful features will likely come to require COI (like today more and more require secure contexts); see https://github.com/mikewest/securer-contexts for some more background on this. /cc @mikewest. So to my mind such a restriction would slot in neatly into the trajectory of the web.

mikewest commented 3 years ago

I haven't reviewed the proposal in detail, but based on the suggested attack surface above, it does seem pretty reasonable to force resources to opt-into inclusion in an environment that potentially grants access to cross-origin pixel values.

/cc @arturjanc and @camillelamy

camillelamy commented 3 years ago

The problem is that COI might not be enough. IIUC the attack surface, cross-origin iframes might be impacted as well. That does not fit in our current threat model for COI (https://arturjanc.com/coi-threat-model.pdf). In WebPlatform terms, increasingly the consensus is that COI accepts APIs that leak data from cross-origin resources but that are limited to the Agent Cluster. To expand to cross-origin iframes, we would need some form of opt-in mechanism. Now, since this has come 3 times already, we should really work on something if that's the direction we want to go in.

BoCupp-Microsoft commented 3 years ago

@domenic, @mikewest, @camillelamy, the threat that @ericlaw1979 pointed out dealt with sampling cross origin content, but the API allows sampling colors from the desktop / non-browser apps / other open browser windows. Do you feel like restricting sampling of embedded cross-origin content is valuable even though we'll be letting the user sample colors from outside the browser?

We discussed some other mitigations above that would capture the desktop only once when the mouse was depressed. That seems to thwart an attacker's ability to move content under the eyedropper so that the user samples something other than what they intended. Also note that the attacker wouldn't receive pointer events or be able to detect the cursor location while in eyedropper mode until the first color is sampled. At that point we would have already captured the desktop image that will be used for subsequent samples making any relocated content after that moment a no-op in terms of reading back attacker controlled content.

Is there an attack you can imagine that is defeated by COI but not by the above?

Thank you very much for the help and your thoughts.

arturjanc commented 3 years ago

It's a little hard for me to understand the entire shape of this proposal (I skimmed the explainer and this thread, but I feel like I'm still missing some details), so apologies if the concerns below are already addressed elsewhere. That said:

Do you feel like restricting sampling of embedded cross-origin content is valuable even though we'll be letting the user sample colors from outside the browser?

The difference between content outside the browser and cross-origin content inside the browser is that a malicious page has the ability to embed authenticated cross-origin resources (images/media, iframes) and control their position and dimensions in a viewport which belongs to the attacker. This allows the attacker to entice / force the user to click and select the color of an attacker-chosen authenticated resource, revealing cross-origin state.

If the user clicks outside of the viewport, the attacker doesn't have any control over the source of the information used for the eye dropper, which makes it less likely that the attacker will be able to leak data. So the cross-origin case seems more worrisome to me in general.

Is there an attack you can imagine that is defeated by COI but not by the above?

Imagine a fun website that will embed an image from socialnetwork.example/myavatar.png, ask you to select three pixels, and based on their colors will create a custom background using colors that match your avatar. This can deanonymize the user because the attacker can crawl all of the social network's avatars and see which ones contain pixels of the given color. (A similar attack would work for example with the security image used by banks to mitigate phishing.)

This is just one example; there are more sophisticated attacks that can leak more sensitive information from iframes/images, with some more seemingly harmless user interaction with the attacking page.

Also note that the attacker wouldn't receive pointer events or be able to detect the cursor location while in eyedropper mode until the first color is sampled.

Wouldn't the attacker be able to detect the cursor location immediately after the color is sampled, revealing the approximate location of the part of the cross-origin resource the user clicked on?

One other question I wanted to ask is how you would handle partial opacity; if I overlay an image/iframe with 0.01 opacity on top of my site, what pixel color would the EyeDropper choose?