WebAudio / web-audio-api

The Web Audio API v1.0, developed by the W3C Audio WG
https://webaudio.github.io/web-audio-api/
Other
1.04k stars 166 forks source link

Restrict sounds beyond normal hearing #2191

Closed samuelweiler closed 4 years ago

samuelweiler commented 4 years ago

Breaking out this comment from https://github.com/WebAudio/web-audio-api/issues/2061, since it has not been addressed in the thread there.

Related to the issue raised in https://github.com/WebAudio/web-audio-api/issues/1486, is there some way to restrict otherwise-unpermissioned output of sounds beyond normal hearing? As in, if you want sub- or super-sonic output, you need to jump through a hoop? (So that the exceptional case requires hoops, but the normal case doesn't.)

bradisbell commented 4 years ago

Such a restriction would be harmful.

Most mid and professional level audio devices allow high sample rates, typically 96 kHz on the low end, and up from there. While not useful for distribution of the end recording, it can be useful for the initial audio capture and intermediate audio files being worked with. It's possible to record something well above the range of hearing and manipulate it down into range later. Restricting the Web Audio API removes entire categories of use cases.

Additionally, there are potentially use cases (regular signal processing) for the Web Audio API that are not strictly audio.

samuelweiler commented 4 years ago

What would be the harm in hiding that extra spectrum behind some gatekeeper: a permission prompt or some other indication of consent? (Or as I put it in my original comment, "hoops".)

Also, to be clear, sample rate (by itself) is being discussed over in https://github.com/WebAudio/web-audio-api/issues/1457. Here I'm asking about sub-sonic sounds, also.

rtoy commented 4 years ago

Teleconf: We discussed in our weekly teleconf and agreed that permissions (if any) on the sample rate would also allow subsonic and ultrasonic audio. We don't need an extra permission prompt here. So if you're allowed to use, say, 96 kHz sample rate, you will get audio up to 48 kHz (and down to 0) without any additional permission prompts.

kawogi commented 4 years ago

Removing frequency ranges always comes with costs: filters will introduce phase shifting and/or latency - especially when trying to filter on the low end.

I believe there's no sane lower boundary: There's audio-equipment specialized on emitting sub-sonic frequencies (they're commonly called "body shakers" here) - they're used for game feedback and other things.

There's also no sane upper limit. Most adults cannot hear frequencies above 16 kHz so everything above could be considered inaudible.

Filtering might even contribute to fingerprinting: Every device would have to implement the same filter.

I assume the attack vector being discussed here is fingerprinting a room through a speaker-mic-combo using inaudible frequencies?

One could just listen on the microphone and emit frequencies that are masked by the human brain. You can also do a lot of fingerprinting using the microphone alone (like noise-floor, room modes, deconvolution of plosives in the human voice).

rtoy commented 4 years ago

As mentioned in https://github.com/WebAudio/web-audio-api/issues/2191#issuecomment-614764131, we consider this issue closed. If you're allowed to use a particular sample rate, then there is no additional constraints on the allowed frequencies.

samuelweiler commented 4 years ago

Filtering might even contribute to fingerprinting: Every device would have to implement the same filter.

Why do you think so? Looking at this attack:

I assume the attack vector being discussed here is fingerprinting a room through a speaker-mic-combo using inaudible frequencies?

(Yes, is what I'm thinking, though I see this more as of "tell if multiple devices are in proximity to each other". I've also heard a theory about it being used within one device to violate restrictions on communication between contexts.)

So, again, look at that attack: I'm thinking that perhaps it doesn't need to be the same filter. For one, having filters on only one side (emission or reception of sound) would be sufficient, and, since we're in analog space, perhaps having some variability isn't terrible here. It's not like the floating point library fingerprinting issues discussed elsewhere.

I understand that the WG has decided to stop working on this issue. I do not consider it resolved.

samuelweiler commented 4 years ago

One could just listen on the microphone and emit frequencies that are masked by the human brain.

Thank you for raising that issue, which might be new to this discussion.

At the risk of forking this discussion in the same issue, which I know can be fraught... What can be done to mitigate against using, as you say, "masked" sounds WITHIN the audible spectrum from being used to correlate devices in proximity, or to communicate between those devices (or even processes on the same machine)?

samuelweiler commented 4 years ago

A news article, for added context: https://www.theatlantic.com/technology/archive/2015/11/your-phone-is-literally-listening-to-your-tv/416712/

samuelweiler commented 4 years ago

Following discussion by the Privacy IG today, I desire two things:

  1. Further exploration of the mitigation space. I get that we might not be able to do anything about linking devices (or frames from different origins) via inaudible sounds, but I would like to see more digging into the possibilities.
  2. Adding this risk to the doc's security and privacy considerations section.
cwilso commented 4 years ago

This risk is already captured in the Security and Privacy section (https://webaudio.github.io/web-audio-api/#priv-sec) since at least early 2018. How should this be expanded to better detail this?

:current text snippet: For voice or sound-actuated devices, Web Audio API might be used to control other devices. In addition, if the sound-operated device is sensitive to near ultrasonic frequencies, such control might not be audible. This possibility also exists with HTML, through either the

The limit of human hearing is usually stated as 20kHz. For a 44.1kHz sampling rate, the Nyquist limit is 22.05kHz. Given that a true brickwall filter cannot be physically realized, the space between 20kHz and 22.05kHz is used for a rapid rolloff filter to strongly attenuate all frequencies above Nyquist.

At 48kHz sampling rate, there is still rapid attenuation in the 20kHz to 24kHz band (but it is easier to avoid phase ripple errors in the passband).

samuelweiler commented 4 years ago

In PING's tracking issue, @cwilso points out that https://www.w3.org/TR/mediacapture-streams/#privacy-and-security-considerations covers the device-linking risk. Does it make sense to incorporate by reference the whole of that section? (I would still like to see more exploration of the mitigation space, also.)

rtoy commented 4 years ago

Re-opening so we can easily find this issue.

cwilso commented 4 years ago

(But to be clear; the WG's current stance is "this is not a rational mitigation to apply, and this risk is already sufficiently documented." If the PING group wants to recommend specific text to add, the editors are amenable.

samuelweiler commented 4 years ago

How about adding to https://webaudio.github.io/web-audio-api/#priv-sec:

Additionally, take note of the security and privacy considerations from the Media Capture and Streams specification. Of note, analysis of ambient audio or playing unique audio may enable identification of user location down to the level of a room or even simultaneous occupation of a room by disparate users or devices. It might also enable communication between otherwise partitioned contexts in one browser.

pan-athen commented 3 years ago

There's also no sane upper limit. Most adults cannot hear frequencies above 16 kHz so everything above could be considered inaudible.

I know the issue is closed and sorry if I repeat any information as my reply is late and I'm not an active participant on this discussions.

A point of argument for the inaudibility of high range frequency content can be made, because of two reasons:

1) The official human hearing range is 20 Hz to 20 KHz and it should be respected by any creative audio application. There are things up there that make the difference on how you perceive the content both psychoacoustically and cognitively.

and

2) Frequencies in the higher range contribute to the timbre that we hear in the middle range by the physical phenomenon now as the "combination tone". I'm referring to the physical phenomenon discovered by the violinist Giuseppe Tartini, in which the sum and difference tones are thought to be caused sometimes by the non-linearity of the inner ear. Therefore by restricting content in the higher and yet still audible range, you also cut content in the middle very important range. There is a reason why the first inventors of digital audio didn't cut the audio so low.

My opinion is that the Web Audio API is capable of some pretty impressive things with its 32-bit floating point internal processing. I can see the issues in privacy and data management that are involved on the matter. But I think it would be a pity to restrict the frequency range to an API that can be easily be the high-end audio API for any web technologies.

A little late on the discussion but I though I share my opinion on the high-range restriction point.