WebAudio / web-midi-api

The Web MIDI API, developed by the W3C Audio WG
http://webaudio.github.io/web-midi-api/
Other
325 stars 49 forks source link

Define a security model for requesting access to the MIDIAccess interface #3

Closed jussi-kalliokoski closed 11 years ago

jussi-kalliokoski commented 11 years ago

Originally reported on W3C Bugzilla ISSUE-17417 Tue, 05 Jun 2012 12:47:58 GMT Reported by Michael[tm] Smith Assigned to Chris Wilson

Audio-ISSUE-104: Define a security model for requesting access to the MIDIAccess interface [MIDI API]

http://www.w3.org/2011/audio/track/issues/104

Raised by: Jussi Kalliokoski On product: MIDI API

The initial idea was that we'd use getUserMedia("midi") but this is potentially confusing, as MIDIAccess is not a MediaStream. Maybe extend the Navigator object with a similar function, such as getMIDIAccess(successCallback, ?failureCallback)?

jussi-kalliokoski commented 11 years ago

Original comment by Olivier Thereaux on W3C Bugzilla. Wed, 06 Jun 2012 14:33:06 GMT

From the editor: change set https://dvcs.w3.org/hg/audio/rev/1ab2a972b9bc adds a security model for the MIDI API.

Please review.

jussi-kalliokoski commented 11 years ago

Original comment by Olivier Thereaux on W3C Bugzilla. Fri, 15 Jun 2012 11:35:51 GMT

Seeing no objection after more than a week, closing.

jussi-kalliokoski commented 11 years ago

Original comment by Chris Wilson on W3C Bugzilla. Thu, 13 Dec 2012 19:14:31 GMT

Open question of what precisely the security model around MIDI should be, and what the terminology should be around prompting the user.

jussi-kalliokoski commented 11 years ago

Original comment by Chris Wilson on W3C Bugzilla. Thu, 13 Dec 2012 19:22:33 GMT

On Thursday, December 13, 2012 at 1:53 PM, Marcos Caceres wrote:

Obtains an interface to enumerate and request access to MIDI devices on the user's system.

This call may prompt the user for access to MIDI devices. The above needs to be a SHOULD. If the user accepts

accepts should be "If the user gives express permission"

or the call is otherwise approved, successCallback is invoked, with a MIDIAccess object as its argument.

If the user declines or the call is denied, the errorCallback (if any) is invoked. All the above should really be in the algorithm or all this should be labelled as non-normative (i.e., this is a note of how it works conceptually, but can't be implemented).

jussi-kalliokoski commented 11 years ago

Original comment by Jussi Kalliokoski on W3C Bugzilla. Thu, 13 Dec 2012 20:20:25 GMT

I agree that the word should be "SHOULD". After all, it's the ideal, and "SHOULD" still isn't "MUST".

jussi-kalliokoski commented 11 years ago

Original comment by Chris Wilson on W3C Bugzilla. Thu, 13 Dec 2012 21:04:58 GMT

(In reply to comment #5)

I agree that the word should be "SHOULD". After all, it's the ideal, and "SHOULD" still isn't "MUST".

It's true, SHOULD isn't MUST - but I've become much less convinced there's a real fingerprinting issue here, particularly since Java has had unprompted MIDI support for a vary long time - and the exploits would be VERY uncommon and very equipment-dependent. I'm exploring internally with security folks to get their sense, but I don't think that the UA SHOULD prompt the user in the default case.

jussi-kalliokoski commented 11 years ago

Original comment by Jussi Kalliokoski on W3C Bugzilla. Fri, 14 Dec 2012 09:33:54 GMT

(In reply to comment #6)

(In reply to comment #5)

I agree that the word should be "SHOULD". After all, it's the ideal, and "SHOULD" still isn't "MUST".

It's true, SHOULD isn't MUST - but I've become much less convinced there's a real fingerprinting issue here, particularly since Java has had unprompted MIDI support for a vary long time

Yes, Java is quite well-known for its security features... Hahaha, sorry, that fruit was hanging way too low for me to resist.

and the exploits would be VERY uncommon and very equipment-dependent. I'm exploring internally with security folks to get their sense, but I don't think that the UA SHOULD prompt the user in the default case.

I agree with you on exploits, they're likely to be a very uncommon and relatively meaningless, but they're still exploits. The last thing we need is more attack-vector surface on the web.

As for fingerprinting, if the default is not to ask, we void every other working group's often extreme efforts to avoid user fingerprinting and practically give the user's identity on a plate to anyone who wants to take it. That is, if they have any distinguishable MIDI devices. The main reason Java's MIDI API isn't used for fingerprinting often is that it's not very subtle (you want fingerprinting to be subtle). Add that to the fact that just the MIDI information isn't enough to form a reliable pool of entropy to identify users (usually), and it's not a very tempting choice. However, if the user doesn't even notice that you're getting the info, it's a very nice source of entropy. We don't want to add a freebie to the already-too-large pool of entropy each user carries with their browsing session.

jussi-kalliokoski commented 11 years ago

Original comment by Florian Bomers on W3C Bugzilla. Fri, 14 Dec 2012 15:09:20 GMT

I've always had second thoughts about the fact that MIDI access wasn't governed by a security manager in Java. After all, an exploit is not impossible: with MIDI, we're often communicating directly with kernel drivers, and there are many BAD drivers around. At least a denial of service attack seems possible, provided that you find a corresponding bug.

Also, MIDI can be used with virtual ports to communicate outside any sandbox. E.g. http://audiob.us/ on iOS, which started off by using a virtual MIDI port to transport audio data from app to app in real time (something which is normally not possible due to the sandbox). However, Apple seems to allow this.

Do audio streams require an explicit acknowledgement of the user?

jussi-kalliokoski commented 11 years ago

Original comment by Jussi Kalliokoski on W3C Bugzilla. Fri, 14 Dec 2012 15:32:35 GMT

(In reply to comment #8)

Do audio streams require an explicit acknowledgement of the user?

Depends. Generally, you need permission to read streams, e.g. Web Audio API doesn't require explicit permission while accessing a microphone does with MediaStreams does. On iOS (afaik), though, you need user interaction to activate any audio playback in the browser.

jussi-kalliokoski commented 11 years ago

Original comment by Marcos Caceres on W3C Bugzilla. Mon, 24 Dec 2012 08:45:27 GMT

Created attachment 1303 [details] UA Permissioning

(of course, this won't go into the spec)... this is what I was thinking for the permission model, except the lists would be broken into inputs and outputs. Permissioning then just becomes part of the site preferences of a UA.

I've been working on implementing a mockup of this (based Chris' implementation):

http://marcoscaceres.github.com/WebMIDIAPIShim/

Though I have not yet added the ability for the user to select individual inputs and outputs. Will add that over next few days.

jussi-kalliokoski commented 11 years ago

Original comment by Marcos Caceres on W3C Bugzilla. Mon, 24 Dec 2012 08:48:04 GMT

Created attachment 1304 [details] Mockup of permissioning model (site preferences)

(of course, this won't go into the spec)... this is what I was thinking for the permission model, except the lists would be broken into inputs and outputs. Permissioning then just becomes part of the site preferences of a UA.

I've been working on implementing a mockup of this (based Chris' implementation):

http://marcoscaceres.github.com/WebMIDIAPIShim/

Though I have not yet added the ability for the user to select individual inputs and outputs. Will add that over next few days.

jussi-kalliokoski commented 11 years ago

Original comment by Chris Wilson on W3C Bugzilla. Tue, 25 Dec 2012 23:14:02 GMT

(In reply to comment #11)

Created attachment 1304 [details] Mockup of permissioning model (site preferences)

(of course, this won't go into the spec)... this is what I was thinking for the permission model, except the lists would be broken into inputs and outputs. Permissioning then just becomes part of the site preferences of a UA.

I've been working on implementing a mockup of this (based Chris' implementation):

http://marcoscaceres.github.com/WebMIDIAPIShim/

Though I have not yet added the ability for the user to select individual inputs and outputs. Will add that over next few days.

This level of tweakiness is exactly what I'm worried about. I don't think any sane user will walk through the list of their available MIDI ports and carefully select which they're comfortable "sharing" with a web application - they'll either say OK or not OK. And even that, I'd like to minimize as much as possible, and I want the spec to continue to make it clear that the implementation does not NEED to prompt the user with UI; this may come inside a web app that has permissions already set, or in a loose environment that has already had MIDI access approved.

jussi-kalliokoski commented 11 years ago

Original comment by Marcos Caceres on W3C Bugzilla. Tue, 25 Dec 2012 23:40:15 GMT

This level of tweakiness is exactly what I'm worried about. I don't think any sane user will walk through the list of their available MIDI ports and carefully select which they're comfortable "sharing" with a web application

  • they'll either say OK or not OK.

Right, but there can be multiple representations of this dialog. I personally like having the ability to choose, if only because it allows me as a user to see that everything is plugged in (I know, different use case... but it's still related because I may choose to unplug something at this point for privacy/personal reasons).

Another representation could be like the geolocation permission bar, but with a way to expand it to give the view that I linked to.

And even that, I'd like to minimize as much as possible, and I want the spec to continue to make it clear that the implementation does not NEED to prompt the user with UI; this may come inside a web app that has permissions already set, or in a loose environment that has already had MIDI access approved.

Agree. This would be good for just output. For example, in a game, it would suck to have to ask the user if they want to hear MIDI sound effects.

Regardless, I think the point is that there needs to be enough flexibility in the security model to allow for these various scenarios (and that both implementors and users understand the risks ... I know, d'uh Marcos!).

I think the current text gives that flexibility already and anything else might be overreaching.

jussi-kalliokoski commented 11 years ago

Original comment by Chris Wilson on W3C Bugzilla. Wed, 26 Dec 2012 17:09:46 GMT

I think I've lost track of what the requested changes are here.

I believe there should be enough flexibility so that an implementation that chooses, under any circumstances, to not prompt the user is not considered non-compliant (or even "making poor choices"). Although I understand that the current language is somewhat loose to allow this, I don't think it "can't be implemented" - it simply offers a choice. Other specifications have similar security options, implemented differently across browsers; how do we mirror that?

jussi-kalliokoski commented 11 years ago

Original comment by Marcos Caceres on W3C Bugzilla. Sat, 29 Dec 2012 01:24:25 GMT

I've been trying to come up with a more incremental security model to address the common use case of just getting access to system default ports without needing to ask for permission (perfect for games sound effects) - while at the same time incrementally increasing the security controls to allow users to control what inputs and outputs are made available to an application (and also handle the case of hot plugging and unplugging devices). FWIW, I don't think the current security model handles this well (and may even break if the API does eventually have to deal with people plugging and unplugging devices).

My extremely preliminary thoughts are captured in the link below:

https://gist.github.com/4384745

It would require some significant changes to the API (e.g., having a single midi access point and doing away with the MIDIAccess object).

As I now have a more or less functional reference implementation of the MIDI API, I'll try to prototype a demo over the next week. However, if anyone wants to help me hash out these ideas, that would be greatly appreciated.

jussi-kalliokoski commented 11 years ago

Original comment by Chris Lilley on W3C Bugzilla. Tue, 08 Jan 2013 17:03:27 GMT

(In reply to comment #12)

This level of tweakiness is exactly what I'm worried about. I don't think any sane user will walk through the list of their available MIDI ports and carefully select which they're comfortable "sharing" with a web application

  • they'll either say OK or not OK. And even that, I'd like to minimize as much as possible

In general I agree, but I can think of one case where a user might want to have more fine-grained control. Suppose they are happy to share their input devices (keyboards, pads etc) and the output devices that can be played (including bank switch etc) except for a device that can be written to destructively (e.g. can have new patches or samples uploaded, loosing the previously stored ones).

But maybe that is better addressed as write access or disabling sysex rather than port-by-port.

jussi-kalliokoski commented 11 years ago

Original comment by Chris Wilson on W3C Bugzilla. Tue, 08 Jan 2013 17:51:43 GMT

(In reply to comment #16)

(In reply to comment #12)

This level of tweakiness is exactly what I'm worried about. I don't think any sane user will walk through the list of their available MIDI ports and carefully select which they're comfortable "sharing" with a web application

  • they'll either say OK or not OK. And even that, I'd like to minimize as much as possible

In general I agree, but I can think of one case where a user might want to have more fine-grained control. Suppose they are happy to share their input devices (keyboards, pads etc) and the output devices that can be played (including bank switch etc) except for a device that can be written to destructively (e.g. can have new patches or samples uploaded, loosing the previously stored ones).

If someone really wants this power, of course it's not my place to say no. My point is that this is a very advanced tweaky configuration, and experience leads me to believe 99.99...% of users will not mess with such things (kinda like people hand-editing their security zones in IE - super-useful tool, few people mess with it.) I'm not say I want to prevent a UA from working this way, I am saying I do not want to mandate it. The current spec would absolutely let a UA selectively decide to expose each port independently and be compliant.

But maybe that is better addressed as write access or disabling sysex rather than port-by-port.

It would have to be sysex - there's no "write access", other than access to output ports. And you could selectively enable sysex port-by-port. That would be marginally acceptable (there are still a lot of devices that use sysex heavily for normal operatio

jussi-kalliokoski commented 11 years ago

Original comment by Chris Wilson on W3C Bugzilla. Tue, 08 Jan 2013 17:56:57 GMT

Grr. tab-sended accidentally.

It would have to be sysex - there's no "write access", other than access to output ports. And you could selectively enable sysex port-by-port. That would be marginally acceptable (there are still a lot of devices that use sysex heavily for normal operatio n - for example, the standardized MIDI Machine Control messages (start/stop/ffw/rewind) are actually sysex messages. I've been talking to Incident about the GTar; it, like some other devices, uses sysex for normal communication.

Really - I see the potential risks of exposing MIDI; in fact, I've detailed them personally in the specification. At the same time, in the balance with user experience - I do not see the need to throw a dialog up in the user's face every time they want to use a MIDI controller. If all I have attached to my machine is a keyboard input device, I should be able to say once "yes it's cool, don't ask me again" and that should be compliant.

jussi-kalliokoski commented 11 years ago

Original comment by Marcos Caceres on W3C Bugzilla. Tue, 08 Jan 2013 19:57:16 GMT

(In reply to comment #18)

Really - I see the potential risks of exposing MIDI; in fact, I've detailed them personally in the specification. At the same time, in the balance with user experience - I do not see the need to throw a dialog up in the user's face every time they want to use a MIDI controller. If all I have attached to my machine is a keyboard input device, I should be able to say once "yes it's cool, don't ask me again" and that should be compliant.

I agree.

Regarding outputs: Ideally, for system default output (if it can be determined by the UA) you should not have to ask for permission. It's not really that different to using

I still think we need to have a bigger discussion about folding MIDIAccess into a single naviagator.midi. I think it would simplify the API.

jussi-kalliokoski commented 11 years ago

Original comment by Chris Wilson on W3C Bugzilla. Tue, 08 Jan 2013 20:11:16 GMT

(In reply to comment #19)

Regarding outputs: Ideally, for system default output (if it can be determined by the UA) you should not have to ask for permission. It's not really that different to using

Which browsers have differing opinions of (autoplay without user interaction). But it is still a bit different, because you can write data that will overwrite patches, etc. - as long as you have access to sysex. Without it, all you could do maliciously would be to switch to different patches, etc.

I still think we need to have a bigger discussion about folding MIDIAccess into a single naviagator.midi. I think it would simplify the API.

Now would be a really really good time. Can you make Thursday's call?

jussi-kalliokoski commented 11 years ago

Original comment by Marcos Caceres on W3C Bugzilla. Tue, 08 Jan 2013 20:14:28 GMT

(In reply to comment #20)

I still think we need to have a bigger discussion about folding MIDIAccess into a single naviagator.midi. I think it would simplify the API.

Now would be a really really good time. Can you make Thursday's call?

Yes, I'll be there.

cwilso commented 11 years ago

I took an action item to document some options here.

The center of the security model discussion - or more aptly, how it might affect the API design - is really around granularity: instead of treating the entire MIDI system as one big chunk, you can break apart the levels of 1) enumerating devices 2) gaining access to an input device 3) gaining access to an output device (further granularity around sysex) 4) gaining access to an input AND an output device (note that this would have to be ANY input and output pair; they are not necessarily paired in the underlying system. Sometimes intermediate shims like Automap would confuse the issue further, as well.)

1) Enumerating devices is really only a fingerprinting concern (as discussed in section 4.1 of the spec).

2) gaining access to a particular input device (and listening for messages on it) is a useful scenario - using a MIDI controller for keyboard or drum pad input, for example - however, many of these devices also are bi-directional (they use data sent back to the device to change lights, display, etc.). If only an input is accessed, however, damaging scenarios as described in section 4.2 cannot occur.

3) gaining access to a particular output device (and being able to send messages to that device) is a useful scenario (independent of input). For example, a musical notation program could use this to preview content. One further level of granularity here could be the ability to send system exclusive commands - as only system exclusive commands could perform the actions described in section 4.2 (erasing patches, capturing audio, etc.) Unfortunately, some common MIDI commands are sent as system exclusive messages (MIDI Machine Control, for example - http://en.wikipedia.org/wiki/MIDI_Machine_Control - generic start/stop/rew/ffw commands) - and many devices use system exclusive to program patches, download firmware, etc., which is a much-demanded scenario for Web MIDI.

4) obviously, access to input and output together is pretty full access. Further granularity here would involve (as in #3) sysex as an option; as above, many (though not all) interesting scenarios involve sysex, though.

The best I could think of in order to enable this granular, may-not-prompt-you would be to add options on the requestMIDIAccess call:

requestMIDIAccess( "input+output", onsuccess, onerror ); or requestMIDIAccess("input+output+sysex", onsuccess, onerror );

Then, of course, if you wanted to change your level of access later, you'd have to re-requestMIDIAccess().

Alternately, something like Marcos' suggestion in a gist (https://gist.github.com/4384745) could be used:

if (!navigator.midiAccess.enabled( "input" )) navigator.midiAccess.requestAccess( "input", onsuccess, onerror );

etc., etc. However, 1) this could result in prompting the user more than once, if we're not careful, and 2) this doesn't get away from the need to have an asynchronous hook somewhere while the UA (possibly) prompts the user.

In short, I'm not a huge fan. I see some slight value in granularity, though I think, quite frankly, this is under-the-hood details that very very few users would care about, and if you trust a site enough to give them any MIDI access (because you want to use a MIDI feature), you probably trust them enough to give them all MIDI access - or you're going to unplug (or virtually unplug - i.e. mask out in UA settings) some of your MIDI devices.

I think we should stick with the system (and text) we have now.

cwilso commented 11 years ago

Bumping this discussion. Is there any dissenting opinion?

cwilso commented 11 years ago

New model and lots more security text in https://github.com/WebAudio/web-midi-api/commit/c7603dcfe239dcb23b562f3ee483253f42bd7311.

cwilso commented 11 years ago

Seems to be general agreement on new model. Closing.