WebAudio / web-audio-api

The Web Audio API v1.0, developed by the W3C Audio WG
https://webaudio.github.io/web-audio-api/
Other
1.04k stars 166 forks source link

[privacy] Exposing data to an origin: fingerprinting #1457

Closed svgeesus closed 4 years ago

svgeesus commented 6 years ago

Does this specification expose any other data to an origin that it doesn’t currently have access to?

Yes. When giving various information on available AudioNodes, the Web Audio API potentially exposes information on characteristic features of the client (such as audio hardware sample-rate) to any page that makes use of the AudioNode interface. Additionally, timing information can be collected through the AnalyserNode or ScriptProcessorNode interface. The information could subsequently be used to create a fingerprint of the client.

from Tom Ritter: https://lists.w3.org/Archives/Public/public-privacy/2017OctDec/0017.html

If a UA wanted to report generic information for a user, effectively lying about audio hardware sample-rate and the rest) so that every user presented the same data to prevent fingerprinting - are you able to recommend sensible defaults in the draft? Or what a UA should consider when choosing sensible defaults?

rtoy commented 6 years ago

For the sample rate, typical values would be 44100 or 48000. These are all common values in use.

svgeesus commented 6 years ago

Apart from some high-end outliers (192k/24bit for example) the majority of uses are as @rtoy says going to be the CD rate of 44100 or the DAT rate of 48000, even if the actual hardware can go higher (96000 commonly) it is typically set to one of those two in practice.

The additional fingerprinting surface area is thus non-zero, but small.

joeberkovitz commented 6 years ago

Speaking not as chair, there are so few sample rates in use out there that it is hard to see how sampleRate can be useful.

As far as timing information goes, isn't it the case that any CPU intensive task in JS yields fingerprintable results? To what extent do these add fingerprinting exposure to what exists already?

mdjp commented 6 years ago

Waiting for response from security and privacy review.

tomrittervg commented 6 years ago

As far as timing information goes, isn't it the case that any CPU intensive task in JS yields fingerprintable results? To what extent do these add fingerprinting exposure to what exists already?

They can, certainly, but efforts in reducing the resolution of timers and jittering them have made those efforts less effective.

padenot commented 6 years ago

UAs could certainly have a mode where the sample-rate is forced to either 44.1kHz or 48kHz. Additionally, AudioDestinationNode.maxChannelCount could be also constrained to always return 2 (stereo, which is certainly the immense majority of setups).

This would not really reduce the usefulness of the API by much and reduce the exposed entropy by a bit. Additionally, it is trivial to implement.

Does any implementer has statistics on which sample-rate is the more common? It probably does not matter in practice as long as it's one of those two.

rtoy commented 6 years ago

Chrome has statistics on the sample rates. I'm not sure I can share exact numbers, but suffice to say that over 95% of all systems use either 44.1 kHz or 48 kHz. However, there is a huge variety of rates. And we know some people use 192 kHz because people filed issues about Chrome not working at that rate.

I agree that stereo is by far the most common (sadly, looks like we have no statistics on that), I think there are people who do want to support 5.1 audio systems. I'm reluctant to set maxChannelCount to 2; we should allow at least up to 6.

padenot commented 6 years ago

Just a clarification, this is not intended to be the normal settings for a normal UA.

A specific UA, or a setting in an existing UA would result in having, say, always stereo output and 44.1kHz sample-rate, allowing users willing to sacrifice some feature for some increased privacy to do so.

For example, going to about:config in Firefox and toggling privacy.resistFingerprinting will change a bunch of APIs, but it's not the default configuration.

svgeesus commented 6 years ago

The spec now says:

Does this specification allow an origin access to aspects of a user’s local computing environment?

Not directly; all requested sample rates are supported, with upsampling if needed. It is possible to use Media Capture and Streams to probe for supported audio sample rates with MediaTrackSupportedConstraints. This requires explicit user consent. This does provide a small measure of fingerprinting. However, in practice most consumer and prosumer devices use one of two standardized sample rates: 44.1kHz (originally used by CD) and 48kHz (originally used by DAT). Highly resource constrained devices may support the speech-quality 11kHz sample rate, and higher-end devices often support 88.2, 96, or even the audiophile 192kHz rate.

Requiring all implementations to upsample to a single, commonly-supported rate such as 48kHz would increase CPU cost for no particular benefit, and requiring higher-end devices to use a lower rate would merely result in Web Audio being labelled as unsuitable for professional use. https://webaudio.github.io/web-audio-api/#priv-sec

svgeesus commented 6 years ago

Pinged commentor here https://lists.w3.org/Archives/Public/public-privacy/2018JanMar/0027.html

svgeesus commented 6 years ago

Response at https://lists.w3.org/Archives/Public/public-privacy/2018JanMar/0028.html

samuelweiler commented 4 years ago

I suggest reopening this issue and reframing the informative "most devices use..." as a specific suggestion that UA's pick one of those rates and hide anything higher behind a permission prompt. I would like to see just one of them picked, and that likely has to be 44kHz (being the least common denominator), but I'll leave that to you.

padenot commented 4 years ago

Answering with my Firefox developer hat on (and not spec editor):

Firefox has a mode called privacy.reduceFingerprinting. If this is enabled, then AudioDestinationNode.maxChannelCount always returns 2 and AudioContext.sampleRate always 44100: two number that are overwhelmingly common. We're now looking at making this more common.

Additionally, we have a patch (not landed) that can defeat all fingerprinting attempts that we found. We've not landed it, because we found that those fingerprinting libraries are in fact just characterizing the floating point computations of the machine (and no need for web audio for that). Merely changing libm.so on a computer changes the result (deterministically).

svgeesus commented 4 years ago

I suggest reopening this issue and reframing the informative "most devices use..." as a specific suggestion that UA's pick one of those rates and hide anything higher behind a permission prompt. I would like to see just one of them picked, and that likely has to be 44kHz (being the least common denominator), but I'll leave that to you.

It seems that 44.1kHz is by far the most common (for built-in hardware) on Windows, but 48kHz is the built-in hardware default for Android. So picking just one of those disadvantages the other platform.

On the other hand 88.2kHz, 96kHz, and 192kHz are unusual and associated with dedicated, aftermarket hardware so hiding those behind a permission prompt seems doable to me. Interested to hear browser developer perspectives on that.

rtoy commented 4 years ago

IIRC, the UA string often contains the OS, so if that's true, having different rates for different OSes would seem not to increase fingerprinting in any way. For MacOS, it seems the most common rate is 44.1 kHz too, like Windows.

padenot commented 4 years ago

It seems that 44.1kHz is by far the most common (for built-in hardware) on Windows, but 48kHz is the built-in hardware default for Android. So picking just one of those disadvantages the other platform.

It's unclear that it's this clear cut. I've had a look at the device of quite a few people (this is not a scientific survey or anything), but I see both reasonably often.

svgeesus commented 4 years ago

Paul, thanks for the clarification. What did you think about putting the less common, higher sample rates behind a user permission? Those seem to have some actual capability for fingerprinting.

Also, worth a reminder that MediaDevices.getSupportedConstraints(), which exposes (among other things) the sampleRate is from the Media Capture and Streams specification. i.e. not from the Web Audio API.

rtoy commented 4 years ago

Chrome has stats on the sample rate. Surprisingly, 48 kHz is the most common on Windows. For Mac and Linux, it's 44.1 kHz. Android is 48 kHz.

In all cases, the second most common rate is 44.1 or 48, depending on what the most common rate is.

tomrittervg commented 4 years ago

As much as I don't want to provide fingerprintable information; a permission prompt to expose some minor detail of Web Audio does not seem like a good choice in spending the 'annoying the user' budget.

Desktop and Android are sufficiently distinct that it's not really feasible to prevent distinguishing them. Even hiding which Desktop OS you're on is extremely difficult. So if there's an advantage of being more accurate and reporting different values between Desktop/Android I think that would be acceptable.

svgeesus commented 4 years ago

spending the 'annoying the user' budget

A useful concept, and I agree here.

What about annoying the small fraction of users who have bought an aftermarket soundcard/audio interface and might, perhaps, be somewhat fingerprinted by capabilities like 192kHz sampling or 16 input/output channels?

rtoy commented 4 years ago

The third paragraph of item 5 of https://webaudio.github.io/web-audio-api/#priv-sec basically describes what @padenot is saying in https://github.com/WebAudio/web-audio-api/issues/1457#issuecomment-567566869 in somewhat different terms.

Please advise on what else we need to say or do here.

svgeesus commented 4 years ago

So, in conclusion:

and thus overall, reducing fingerprinting surface from "this device uses one of the two common rates" to "this device uses the one rate that we force you to use" seems neither desirable or feasible.

This has been a useful investigation with good discussion, but I suggest we now close this issue.

padenot commented 4 years ago

Telconf decision: if nothing else comes up, @svgeesus can close this.

samuelweiler commented 4 years ago
  • annoying the tiny percentage of people with pro soundcards with a permission prompt every time thy hit a page using Web Audio does not seem worthwhile ...
  • resampling to a single fixed sample rate is not feasible, increases CPU load on small devices, and low-end hardware sometimes has only a single fixed sample rate

What I'm seeing here is "the devices that can be fingerprinted are the ones that have special audio HW". And from that, I infer "they're the ones with the processing power to fix it".

So maybe the fix is: only use the sampling rate most common for the platform unless special permission is requested. So the default behavior when you have a high end device is to use it to downsample, and permission is only requested for special cases.

padenot commented 4 years ago

special audio HW"

This is not the case anymore, consumer grade computer have DAC that go to higher rates, but I don't think they are set to super high rate by default because that's wasteful. I don't think we can make this assumption.

samuelweiler commented 4 years ago

@svgeesus I've flagged this issue as "needs-resolution". I suggest reopening it.

NalaGinrut commented 4 years ago

Can we keep this issue open? I think there's a concern about uXDT. And the situation is not in science fiction, it's real and has been using for years in the industry.

cwilso commented 4 years ago

One comment that doesn't seem clearly captured in the more recent comments:

It should be clear that the potential for fingerprinting can't just be "solved" with processing power; it requires processing power AND causes a loss in quality (on specifically machines that likely have a higher sampling rate, thus higher quality audio hardware and likely higher expectations for quality.).

It seems rational to allow both 44.1k and 48k rates, as any auto-conversion there would be painful (and not particularly high exposure). Any UA could, of course, auto-downsample to one of those rates, if it was particularly trying to avoid fingerprinting exposure. Is there something else needed in the current spec to enable this if desired?

@NalaGinrut this issue seems unrelated to uXDT, unless I'm mistaken - that seems more like https://github.com/WebAudio/web-audio-api/issues/2191.

NalaGinrut commented 4 years ago

@cwilso If we were talking about "fingerprinting exposure risk under high sample rate sound beyond human hearing", then it's the thing that can be used in uXDT.

cwilso commented 4 years ago

@NalaGinrut That's what I meant - that issue (fingerprinting exposure risk with high sample rate sound - e.g. using ultrasonic beacons) is being discussed in issue #2191.

I'm not sure what suggested resolution is here. The spec allows UAs to choose non-native sample rates - e.g. what Firefox does under a flag. There's not a specific permission for enabling "non-44.1/48k devices" on creation - not sure how to capture that in the API - but a UA could certainly do it and be compliant.

I would STRONGLY advise against setting a single default value and enforcing it. That would be like requiring all CSS layout to be performed to a 1024x768 screen. At the very least, 44.1 and 48 should be allowed.

cwilso commented 4 years ago

First, some anecdata (I'm giving a broad brush stroke picture - @padenot, you should chime in if this doesn't align with your experience):

The vast majority (>99%) of all devices aggregated across all OSes are either 44.1kHz or 48kHz. Unfortunately, it's a somewhat even split across those two - and in particular, different OSes favor one rate or the other. For example, MacOS devices are about 3:1 in favor of 44.1k; Windows is about 6:1 in favor of 48k. Android is very heavily (13:1) 48kHz; iOS devices tended to be 44.1 in the past, but I'm not sure now. There are around 1% of Windows users that use either 96kHz or 192kHz (i.e., pro audio interfaces).

I would like to explicitly state that resampling between 44.1kHz and 48kHz as a normal matter of course is a BAD idea. It costs CPU - and therefore, battery life. (It also will increase latency, as the filtering needed in the resampler has some latency, and of course there's some quality loss, though this isn't the strongest reason.). Additionally, if the Web Audio spec were to mandate either 44.1 or 48kHz as the preferred rate, it would be intentionally disadvantaging one platform or another, in favor of another. Fundamentally, I think we have to sacrifice one bit of fingerprinting here. (Note that this has heavy dependence on native OS, so if that's detectable in some other way, this probably isn't a separate bit.)

However, that does not mean that we must just throw our hands in the air at this potential fingerprinting issue. I can see a two-pronged approach to improve the fingerprinting concerns:

1) The spec should recommend that implementations SHOULD only support native rates of 44.1kHz or 48kHz by default. For users with higher-rate hardware, they would likely need to enable a flag; that design should be left up to the UA. Note that this should not be mandated, because there is a non-zero set of low-sample-rate hardware out there (typically low-power, low-spec devices) that have 8kHz or 16kHz rates - probably because 16kHz is the 4G LTE audio streaming rate. Since those devices are low-spec already, upsampling the rate would be particularly hard on them. This change alone will reduce the entropy to one bit.

2) Essentially the design @padenot described for Firefox's reduceFingerprinting feature could be described in the spec as a privacy option: that is, IF a user enables a specific privacy option, that bit of entropy can be removed. I would, however, recommend if a rate is mandated, that 48kHz be chosen; although of course Chrome's data is skewed (since we don't have iOS data), across all other platforms it's around 4:1 in favor of 48kHz, and upsampling is likely less lossy than downsampling. Additionally, I believe newer iOS hardware supports 48kHz natively.

Would this resolve the privacy concerns to the PING's satisfaction?

samuelweiler commented 4 years ago

I had been understanding that the default rates were more strongly linked to platform.

Even so, I think we're all on board with not resampling between 48 and 44.1 (except as described in your number 2 above), even if that costs us one bit of FP surface. I would like the "anything else is behind a flag - or a permission - or otherwise requires consent" to be mandatory, and I suspect that would be dominant view in PING also. Tell me more about the pain that would come from such a mandate?

cwilso commented 4 years ago

You would be deliberately increasing battery drain (via CPU and memory usage) for low-power devices. In particular, a device that only has 16kHz output would have to use 3x the CPU and memory for Web Audio, to achieve the same audio output. (Actually, probably more, as when downsampling it would be running a filter.)

samuelweiler commented 4 years ago

You would be deliberately increasing battery drain (via CPU and memory usage) for low-power devices. In particular, a device that only has 16kHz output would have to use 3x the CPU and memory for Web Audio, to achieve the same audio output. (Actually, probably more, as when downsampling it would be running a filter.)

Or, presumably, you'd be asking the user's consent to expose more FP surface in lieu of the resampling. I haven't heard you say that hiding this resource-saving but privacy-reducing option (or "optimization") behind a flag or prompt is intolerable. And I'm imagining that memory increase is usually instantaneous - e.g. at the point of the resampling. That seems like it might be tolerable, too.

cwilso commented 4 years ago

It's not that hiding it behind a flag is "intolerable" - it's that because it's likely to simply be overridden by device producers in that case.

I think my point is that if device manufacturers are going to flip the flag anyway - so as not to have more battery drain - you're not really enhancing privacy. At this point, on the whole I'd recommend enabling 16kHz devices too, because the incidence is "relatively high" (and uneven across platforms), and I'm not convinced we know what devices are common there.

I'm not sure what you mean by "memory increase is instantaneous" - anywhere a buffer is created in code, it will be at a higher rate, so 3X bigger. The rate being higher will lead to higher CPU and task-switching, because it will have to process blocks faster.

cwilso commented 4 years ago

During the Web Audio WG meeting, this was discussed. It appears the recommended resolution is:

1) 44.1kHz and 48kHz are allowed as default rates; the system will choose between them for best applicability. (Obviously, if the audio device is natively 44.1, 44.1 will be chosen, etc., but also the system may choose the most "compatible" rate - e.g. if the system is natively 96kHz, 48kHz would likely be chosen, not 44.1kHz. 2) The system should upscale to one of those two rates for devices that are natively lower rates, despite the fact that this may cause extra battery drain due to upscaled audio. (Again, the system will choose the most compatible rate - e.g. if the native system is 16kHz, it's expected that 48kHz would be chosen.) 3) It is expected (though not mandated) that browsers would offer a user affordance to force use of the native rate - e.g. by setting a flag in the browser on the device. This setting would not be exposed in the API. 4) It is also expected behavior that a different rate could be explicitly requested in the constructor for AudioContext (this is already in the specification; it normally causes the audio rendering to be done at the requested sampleRate, and then up- or down-scaled to the device output), and if that rate is natively supported, the rendering could be passed straight through. This would enable apps to render to higher rates without user intervention (although it's not observable from Web Audio that the audio output is not downsampled on output) - for example, if MediaDevices capabilities were read (with user intervention) and indicated a higher rate was supported.

samuelweiler commented 4 years ago

1, 2, and 3 sound great. I'm delighted to see us converge on that! I don't understand whether 4 could be used for fingerprinting.

Exploring 4 more: if an unsupported rate is requested, would that generate an error, or does it force (potentially expensive) resampling?

cwilso commented 4 years ago

On 4: it would force resampling if an unsupported rate was requested (it does this today). Otherwise it would be clearly observable.

samuelweiler commented 4 years ago

On 4: it would force resampling if an unsupported rate was requested (it does this today). Otherwise it would be clearly observable.

Explain more? What, precisely, happens?

rtoy commented 4 years ago

Generating an error for unsupported rates enables further fingerprinting. You'd know the user doesn't have this sample rate. By silently resampling, you can't tell if the user has that rate or not. It just works.

cwilso commented 4 years ago

"what precisely happens" - see previous post: any time a sample rate that is not natively supported is selected, the AudioContext is still created at the requested rate, and resampling will occur between that AudioContext and the actual audio device. This isn't observable from the API.

The stipulation I was making is that if a sample rate that is not 44.1k or 48k is requested (e.g. 96kHz), and that rate IS natively supported by the hardware, the system will send the AudioContext output to the output (rather than creating it at 96kHz, downsampling to 48kHz and then upsampling back to 96kHz). This will enable higher bit rates without quality or performance loss, while not leaking fingerprintable bits (this isn't observable since you can't get access to those bits (the actual bits sent out from the device), and you'd have to use user-prompt-protected browsing of the media devices to know that it's available).

samuelweiler commented 4 years ago

Sounds good to me.

rtoy commented 4 years ago

So, the conclusion is that the comments in https://github.com/WebAudio/web-audio-api/issues/1457#issuecomment-640767256 are what we want to say and do in the spec. More or less.

rtoy commented 4 years ago

Sounds like a plan.

samuelweiler commented 3 years ago

@svgeesus Manipulating the flags on this issue won't work. I'll close the tracking issue.

svgeesus commented 3 days ago

@tjwhalen Sam said he would close the tracking issue but did not. To tidy up, would you mind doing so?