WebAudio / web-midi-api

The Web MIDI API, developed by the W3C Audio WG
http://webaudio.github.io/web-midi-api/
Other
323 stars 49 forks source link

Does MIDIPort.id need to be origin-scoped and/or regenerated with cookies? #48

Open cwilso opened 11 years ago

cwilso commented 11 years ago

From AnneVK:

You need to state how MIDIPort.id is scoped (prolly to origin, no?) and you should state that it should not be the same across origins. We don't want to make it easy to track users using these new identifiers. Furthermore, once the user clears cookies these MIDIPort.id thingies need to be regenerated too. (I gave similar feedback to the WebRTC guys.)

annevk commented 11 years ago

I guess this is also concern with name and manufacturer to a lesser extent. If you have a unique device or are a user of a rather unique localization, tracking will be easier. Of course, it's opt-in, but calling it out and having people look at it would be good. I've been told we lost the war on preventing tracking, but identification is hopefully not entirely lost yet.

cwilso commented 11 years ago

We've lost that battle. There are plenty of other APIs that expose system-specific data like this - the Gamepad API, for example, has precisely the same data.

I'm not sure what you mean by "user of a rather unique localization" - the name and manufacturer are coming from the device, so I presume you mean "user of a relatively rare device produced with a relatively rare localized USB device name", or something like that? This has gotten extremely narrow - you would catch far more users looking for a MIDI devices are relatively mainstream in production - you don't find a lot of one-off device manufacturers/ids - and they are also frequently unplugged - you can't rely on them always being there; my MIDI configuration changes constantly.

Actually, the harder I've thought about this, the more I think we should NOT regenerate with cookies. Origin-scoped is okay - you wouldn't be handing this identifiers across domains - but other than that, it's not worse than index and name (in fact, my polyfill just generates an ID from the index and the name).

annevk commented 11 years ago

If you do not regenerate with cookies you can revive the cookies if the ID is a uuid.

cwilso commented 11 years ago

It's not intended to a UUID, just a GUID (that can be used to revive the right connections when an app is run a subsequent time). Perhaps a comment to that effect would be best.

annevk commented 11 years ago

It still seems like that would allow for reviving given enough other fingerprinting data. If you clear cookies you really want that particular site to have forgotten about you and have no information retained.

cwilso commented 11 years ago

I'm not clear I understand the issue well enough then, because it seems like there is far more than enough other fingerprinting data elsewhere to do this. Can you point me to where to understand this better?

On Thu, Apr 25, 2013 at 7:56 AM, Anne van Kesteren <notifications@github.com

wrote:

It still seems like that would allow for reviving given enough other fingerprinting data. If you clear cookies you really want that particular site to have forgotten about you and have no information retained.

— Reply to this email directly or view it on GitHubhttps://github.com/WebAudio/web-midi-api/issues/48#issuecomment-17011953 .

annevk commented 11 years ago

See e.g. HTML for "fingerprinting".

jussi-kalliokoski commented 11 years ago

I'm fine with the idea of resetting the ID with cookies. This is pretty simple to implement by just salting the IDs with something like the timestamp when cookies were last created, which would be just now in case of private browsing for example. I agree with Anne that even though the API is opt-in, it's better that there's as little linking to the user's identity as possible when cookies are cleared.

That said, I don't see the advantage of extending this to the manufacturer and name, they become pretty useless if they're obfuscated.

cwilso commented 11 years ago

Well, I'd suggest this kinda makes the id pretty useless anyway - shouldn't we just cut id, and rely on order, manufacturer and name, avoiding the potential issue here?

On Sun, Apr 28, 2013 at 9:43 AM, Jussi Kalliokoski <notifications@github.com

wrote:

I'm fine with the idea of resetting the ID with cookies. This is pretty simple to implement by just salting the IDs with something like the timestamp when cookies were last created, which would be just now in case of private browsing for example. I agree with Anne that even though the API is opt-in, it's better that there's as little linking to the user's identity as possible when cookies are cleared.

That said, I don't see the advantage of extending this to the manufacturer and name, they become pretty useless if they're obfuscated.

— Reply to this email directly or view it on GitHubhttps://github.com/WebAudio/web-midi-api/issues/48#issuecomment-17137115 .

jussi-kalliokoski commented 11 years ago

Well, I'd suggest this kinda makes the id pretty useless anyway - shouldn't we just cut id, and rely on order, manufacturer and name, avoiding the potential issue here?

Huh?! What do you mean it makes the ID useless? It's not like users go incognito every time, expecting sites to remember their preferences.

cwilso commented 11 years ago

If I'm an ISV writing MIDI software, I'm going to have to rely on a weird combination of ID, order, manufacturer and name. It just doesn't seem to buy much utility anymore.

On Wed, May 1, 2013 at 10:27 AM, Jussi Kalliokoski <notifications@github.com

wrote:

Well, I'd suggest this kinda makes the id pretty useless anyway - shouldn't we just cut id, and rely on order, manufacturer and name, avoiding the potential issue here?

Huh?! What do you mean it makes the ID useless? It's not like users go incognito every time, expecting sites to remember their preferences.

— Reply to this email directly or view it on GitHubhttps://github.com/WebAudio/web-midi-api/issues/48#issuecomment-17293482 .

jussi-kalliokoski commented 11 years ago

If I'm an ISV writing MIDI software, I'm going to have to rely on a weird combination of ID, order, manufacturer and name.

Why?

cwilso commented 11 years ago

ID can fail, and when it does, it's catastrophic (regen cookies will lose all matching); order/manufacturer/name matching will at worst confuse two of the same devices when one has been unplugged.

On Wed, May 1, 2013 at 10:58 AM, Jussi Kalliokoski <notifications@github.com

wrote:

If I'm an ISV writing MIDI software, I'm going to have to rely on a weird combination of ID, order, manufacturer and name.

Why?

— Reply to this email directly or view it on GitHubhttps://github.com/WebAudio/web-midi-api/issues/48#issuecomment-17295842 .

jussi-kalliokoski commented 11 years ago

ID can fail, and when it does, it's catastrophic (regen cookies will lose all matching); order/manufacturer/name matching will at worst confuse two of the same devices when one has been unplugged.

But that's intended behavior. When a user cleans up cookies, the intent is to make sites forget their preferences. If a site somehow keeps the user's preferences anyway, for example by remembering the user's devices, it's a bit creepy.

cwilso commented 11 years ago

This is "once you've logged back in to the service" - otherwise, you'd have no way to persist the order/man/name data. I'm presuming the site would be storing this remotely, keyed off your auth, along with your other data (e.g. your sequences/tracks/etc.)

If you just go to the site, with no auth, the site cannot have persisted its knowledge of your preferences, fear not.

On Wed, May 1, 2013 at 11:15 AM, Jussi Kalliokoski <notifications@github.com

wrote:

ID can fail, and when it does, it's catastrophic (regen cookies will lose all matching); order/manufacturer/name matching will at worst confuse two of the same devices when one has been unplugged.

But that's intended behavior. When a user cleans up cookies, the intent is to make sites forget their preferences. If a site somehow keeps the user's preferences anyway, for example by remembering the user's devices, it's a bit creepy.

— Reply to this email directly or view it on GitHubhttps://github.com/WebAudio/web-midi-api/issues/48#issuecomment-17296852 .

jussi-kalliokoski commented 11 years ago

That's silly, it makes no sense to store the users' device preferences remotely, it's not like the user would have the same devices on different computers (at least with same IDs) or stuff like that, so the site would either have to fingerprint the different machines the user logs onto the service with (definitely something we don't want to encourage) to know which preferences to apply or store the preferences locally, in which case the preferences would be lost when the cookies are cleared anyway.

cwilso commented 11 years ago

It doesn't matter whether you encourage it or not, they will do it. It provides a better user experience - and it will make no sense to the user that their MIDI setup has been lost just because their roommate was surfing porn and cleared cookies afterward. Hell, my bank fingerprints my machine.

On Wed, May 1, 2013 at 11:27 AM, Jussi Kalliokoski <notifications@github.com

wrote:

That's silly, it makes no sense to store the users' device preferences remotely, it's not like the user would have the same devices on different computers (at least with same IDs) or stuff like that, so the site would either have to fingerprint the different machines the user logs onto the service with (definitely something we don't want to encourage) to know which preferences to apply or store the preferences locally, in which case the preferences would be lost when the cookies are cleared anyway.

— Reply to this email directly or view it on GitHubhttps://github.com/WebAudio/web-midi-api/issues/48#issuecomment-17297514 .

jussi-kalliokoski commented 11 years ago

and it will make no sense to the user that their MIDI setup has been lost just because their roommate was surfing porn.

The user really should tell his/her roommate to go into private browsing mode. xD Clearing cookies is so last season! Personally in the last few years I've use clearing cookies only as a last resort of making a misbehaving site (usually one that I'm developing) forget about me, but I'm guessing there are people who have access to less biased data than my personal usage. :)

My bank recommends that I clear cookies after I log out. But then again, my bank also says I have to use passwords composed of four digits.

I don't think it provides a better user experience to keep a user's preferences when the user explicitly says to clear them.

marcoscaceres commented 11 years ago

FWIW, let's stop using "clearing cookies". I think what Anne meant was "clear private data" (of which cookies is included). @jussi-kalliokoski brings up a good use case, nontheless... it's basically running two browser sessions: one in private browsing mode and the other in "normal" mode.

cwilso commented 11 years ago

Happy to use that terminology. "Clear private data" does not mean "clear all remote data". The remote data WILL likely include setup data; if only because I will want to work on my sequence with my keyboard and my Launchpad on my desktop, and then

Look, I'm not trying to argue this should be baked in to the API; I'm just saying that from a usability perspective, if I was writing the software, I would absolutely cache this stuff and recreate it. I can do that, no matter if the API has indices or not; I'll just iterate through all the ports and match name/manufacturer. (In fact, several of my demos DO this - they iterate through doing name-matching, because I don't want to set up every time, and I frequently switch computers.)

Regardless, I'd like to get back to the core issue: I'd like to change MIDIAccess to:

interface MIDIAccess : EventTarget { sequence inputs (); sequence outputs (); attribute EventHandler onconnect; attribute EventHandler ondisconnect; };

Yes? Don't care that particularly if IDs are still present (and therefore, an index into the sequence). Would slightly prefer cutting them at this point, but seriously - don't care that much.

On Wed, May 1, 2013 at 12:14 PM, Marcos Caceres notifications@github.comwrote:

FWIW, let's stop using "clearing cookies". I think what Anne meant was "clear private data" (of which cookies is included). @jussi-kalliokoskihttps://github.com/jussi-kalliokoskibrings up a good use case, nontheless... it's basically running two browser sessions: one in private browsing mode and the other in "normal" mode.

— Reply to this email directly or view it on GitHubhttps://github.com/WebAudio/web-midi-api/issues/48#issuecomment-17300255 .

cwilso commented 11 years ago

Argh.

"and then, ... switch PCs."

On Wed, May 1, 2013 at 12:26 PM, Chris Wilson cwilso@gmail.com wrote:

Happy to use that terminology. "Clear private data" does not mean "clear all remote data". The remote data WILL likely include setup data; if only because I will want to work on my sequence with my keyboard and my Launchpad on my desktop, and then

Look, I'm not trying to argue this should be baked in to the API; I'm just saying that from a usability perspective, if I was writing the software, I would absolutely cache this stuff and recreate it. I can do that, no matter if the API has indices or not; I'll just iterate through all the ports and match name/manufacturer. (In fact, several of my demos DO this - they iterate through doing name-matching, because I don't want to set up every time, and I frequently switch computers.)

Regardless, I'd like to get back to the core issue: I'd like to change MIDIAccess to:

interface MIDIAccess : EventTarget { sequence inputs (); sequence outputs (); attribute EventHandler onconnect; attribute EventHandler ondisconnect; };

Yes? Don't care that particularly if IDs are still present (and therefore, an index into the sequence). Would slightly prefer cutting them at this point, but seriously - don't care that much.

On Wed, May 1, 2013 at 12:14 PM, Marcos Caceres notifications@github.comwrote:

FWIW, let's stop using "clearing cookies". I think what Anne meant was "clear private data" (of which cookies is included). @jussi-kalliokoskihttps://github.com/jussi-kalliokoskibrings up a good use case, nontheless... it's basically running two browser sessions: one in private browsing mode and the other in "normal" mode.

— Reply to this email directly or view it on GitHubhttps://github.com/WebAudio/web-midi-api/issues/48#issuecomment-17300255 .

marcoscaceres commented 11 years ago

just rewriting @cwilso proposal because GH email support is currently broken:

interface MIDIAccess : EventTarget { 
   sequence<MIDIInput> inputs (); 
   sequence<MIDIOutput> outputs (); 
   attribute EventHandler onconnect; 
   attribute EventHandler ondisconnect; 
}; 
cwilso commented 11 years ago

Gack. Sorry about that.

marcoscaceres commented 11 years ago

So, can't we support both approaches? Crazy thought:

   sequence<MIDIInput> inputs (optional (DOMString or DOMString[]) ids); 
   sequence<MIDIInput> outputs (optional (DOMString or DOMString[]) ids); 

So:

var all = midiaccess.inputs(); 
var some = midiaccess.inputs(["foo", "bar"]);
var one = midiaccess.inputs("foo"); 
cwilso commented 11 years ago

Well, that's certainly crazy. :) (Come on, you were expecting that response. I'm kidding.)

That seems like overengineering.

On Wed, May 1, 2013 at 12:34 PM, Marcos Caceres notifications@github.comwrote:

So, can't we support both approaches? Crazy thought:

sequence inputs (optional (DOMString or DOMString[]) ids); sequence outputs (optional (DOMString or DOMString[]) ids);

So:

var all = midiacces.inputs(); var some = midiacces.inputs(["foo", "bar"]); var one = midiacces.inputs("foo");

— Reply to this email directly or view it on GitHubhttps://github.com/WebAudio/web-midi-api/issues/48#issuecomment-17301348 .

toyoshim commented 10 years ago

Someone want the Port ID to be permanently unique, but someone want it to be randomized for privacy reasons. Origin-based idea may be good goal, but I still feel it's a little overengineering.

Technically speaking, in some Operating Systems, keep on using the same unique ID for the same device is not easy, and browsers may provide unreliable unique IDs which are hopefully permanently unique. As a result, HTML applications may want to handle device identification by themselves using other port information. So I want to give up that browsers provide unique IDs.

In additions, considering privacy, I'm planning to introduce MIDIAccess instance based port ID randomization. If we don't stick on using permanent ID, it looks easy and safe enough.

FYI, here is an interesting site to check how unique your browser is. https://panopticlick.eff.org/ Already there are too many information to make your browser unique :(

jussi-kalliokoski commented 10 years ago

considering privacy

MIDI access is inevitably a large source of entropy due to other identifiers in the MIDI ports, that's one of the reasons why there's the permission model. Getting rid of the pseudo-unique IDs won't solve that.

keep on using the same unique ID for the same device is not easy

Perfectly true, but it's intended to be best guess, just like anything a web developer can come up with using the available information (except that the browser mostly has access to more information about the ports), and for most of the cases it should be enough.

As a result, HTML applications may want to handle device identification by themselves using other port information.

If a developer feels that it won't just cut it, then there's the other properties to roll out your own identifier system, but I seriously doubt most people need it (or can make a better system) unless implementers make bad implementations.

I'm planning to introduce MIDIAccess instance based port ID randomization

Randomized IDs are an implementation choice, but I don't think a very good one. If the implementer cares about the privacy, they don't let web developers access this source of inevitable entropy without a permission.

cwilso commented 10 years ago

If an implementer really cares that deeply about privacy (i.e., they've tried to randomize and protect against all the things that https://panopticlick.eff.org/ comes up with), then they would only enable ANY MIDI access after prompting. The API explicitly allows for this; however, no one that I've spoken to in the security/privacy space thinks this is pragmatically a problem. It is adding a drop of water to the ocean.

At the same time, if IDs are truly randomized across instances, then they're worthless to store across instances, and really, they're probably not worth even exposing then (because the instance of the MIDIInput or MIDIOutput could be identifier enough). Obviously, this would require some slight rework of the design.

If this is the case, as Takashi said, each implementer would need to try to provide port identification using the other exposed information. Unfortunately, this is not possible in many common cases - for example, in my home studio, where I have two of the same USB MIDI interfaces connected. If order is not significant and persistent (and defining how order could be significant across OSes seems like a very steep challenge, if possible at all), then this is not possible, and we would not be able to make even a good attempt at enabling developers to cache persistent MIDI setups.

I think the best thing to do is say: Identifiers SHOULD persist across instances (but are domain-randomized). In many instances, this will not be possible, so developers must be prepared to recover from IDs not being found. In addition, browsers may choose to reset IDs for privacy or other reasons.

cwilso commented 9 years ago

"User agents SHOULD regenerate ids when privacy information is cleared."

annevk commented 9 years ago

Why not MUST? If they have such a feature surely they can take this into account?

cwilso commented 9 years ago

That would require being far more specific about "when privacy information is cleared", which I'd rather avoid touching.

annevk commented 9 years ago

Hmm. We should get clearer about that. Basically it needs to be cleared whenever cookies/storage is cleared.

agoode commented 9 years ago

So, id should be:

?

annevk commented 9 years ago

Yeah, and in particular identifiers need to be globally unique and cannot be reused across origins.

agoode commented 9 years ago

Ok. Something like a version 4 (random) UUID would work then.