Expose AudioContext in Worker/ServiceWorker

guest271314 commented 3 years ago

Describe the feature Expose AudioContext in Worker and ServiceWorker contexts.

Is there a prototype? No.

Currently users need to use Native Messaging to run local media players, for example mpv https://github.com/mpv-player/mpv; see https://github.com/mpv-player/mpv/blob/bc9d556f3a890cf5f99e9dced0117e2d8a91ff09/DOCS/man/javascript.rst, https://github.com/Kagami/mpv.js.

Describe the feature in more detail The ability to use AudioContext in a Worker, particularly in a ServiceWorker context.

See https://bugs.chromium.org/p/chromium/issues/detail?id=1131236.

Use cases:

Developers would like extensions built with service workers to continue to support the ability to play and control audio playback initiated from a background context.
It is a podcast aggregator and it plays audio from the background page so that the user can close its main page and continue to listen to the podcast.
Para mi es una excelente opción el tener un podcast de fondo mientras trabajo, de la misma forma que uso una extensión para indicarme cuando me llegan nuevos mensajes mi correo o ver mi gmail en una mini ventana que se activa en el icono de la extensión.
In general, playing audio without creating an different PictureInPictuireWindow, Window, <iframe> or a dedicated Document just to play audio.

guest271314 commented 3 years ago

I just noticed https://github.com/WebAudio/web-audio-api-v2/issues/16 which for an unknown reason I am blocked from commenting.

Re https://github.com/WebAudio/web-audio-api-v2/issues/16#issuecomment-713011241

We're asking the WebRTC Working Group to weigh in regarding transferable MediaStreams to extend this first cut

Note, it is currently possible to transfer a ReadableStream/WritableStream representation of a MediaStreamTrack (on Chromium/Chrome) using MediaStreamTrack API for Insertable Streams of Media https://github.com/w3c/mediacapture-transform and WebRTC Encoded Transform https://github.com/w3c/webrtc-encoded-transform due to Chromium/Chrome supporting Transferable Streams https://github.com/whatwg/streams/blob/main/transferable-streams-explainer.md. See also https://github.com/w3c/mediacapture-extensions/pull/26.

padenot commented 3 years ago

AudioWG calls:

The worker case is WebAudio/web-audio-api-v2#16, and this is agreed upon and will happen and yes, transferable MediaStream (not just ReadableStream/WriteableStreams) will happen.

For the service worker case, it's really unclear. If an app starts playing audio, and then all tabs for this app are closed, how do you pause the audio? There is no way to display a UI element.

guest271314 commented 3 years ago

For the service worker case, it's really unclear. If an app starts playing audio, and then all tabs for this app are closed, how do you pause the audio? There is no way to display a UI element.

It is not immediately clear in the specification

We're not going to allow a service worker to run indefinitely.

that the ServiceWorker is intended to become "inactive" at 5 minues, because "5 minutes" does not appear in the specification https://w3c.github.io/ServiceWorker/.

Nonetheless per (Chromium) source code the ServiceWorker can remain "active" as long as one or more of several conditions are met https://source.chromium.org/chromium/chromium/src/+/master:content/browser/service_worker/service_worker_version.cc;l=609 through 613.

  DCHECK(event_type == ServiceWorkerMetrics::EventType::INSTALL ||
         event_type == ServiceWorkerMetrics::EventType::ACTIVATE ||
         event_type == ServiceWorkerMetrics::EventType::MESSAGE ||
         event_type == ServiceWorkerMetrics::EventType::EXTERNAL_REQUEST ||
         status() == ACTIVATED)

(On Firefox BroadcastChannel in a ServiceWorker can remain active even after the ServiceWorker is unregistered BroadcastChannel created in ServiceWorker outlives unregistration and page reload https://bugzilla.mozilla.org/show_bug.cgi?id=1676043)

That effectively means that ServiceWorker can remain "active" without tabs, and still retain the ability to communicate with the ServiceWorker when a new tab is created.

One Chrome extension developer created a workaround for the un-specified 5 minute "inactive" case here https://bugs.chromium.org/p/chromium/issues/detail?id=1152255#c25.

I created a workaround here https://bugs.chromium.org/p/chromium/issues/detail?id=1152255#c31 and https://bugs.chromium.org/p/chromium/issues/detail?id=1152255#c32 using an extension and window.open() or <iframe> and postMessage(). Therefore we can communicate with the ServiceWorker any time before or after tab closure and reopening, something like

console

onmessage = (e) => console.log(e);
var f = document.createElement('iframe');
f.style = 'display:none';
document.body.appendChild(f);
f.src = 'chrome-extension://jmnojflkjiloekecianpibbbclcgmhag/keepActive.html';

manifest.json

...
  "background": {
    "service_worker": "background.js"
  },
  "permissions": ["nativeMessaging", ...],
  "host_permissions": ["<all_urls>"],
  "web_accessible_resources": [    {
      "resources": ["keepActive.html", "keepActive.js"],
      "matches": [ "https://bugs.chromium.org/*", ...],
      "extensions": [...]
  }],  
...

See WebAudio/web-audio-api#2381. Add URL's to 'matches' at 'web_accessible_resources', attached image was test at console on github.

background.js

let now = performance.now();

self.addEventListener('message', async (e) => {
  e.source.postMessage(((performance.now() - now) / 1000) / 60);
});

keepActive.html

<!DOCTYPE html><html><body><script src="keepActive.js"></script></body></html>

keepActive.js

onload = async () => {
  parent.postMessage('ServiceWorker opener', '*');
  navigator.serviceWorker.addEventListener('message', e => {
    parent.postMessage(e.data, '*');
  });
  navigator.serviceWorker.ready.then( registration => {
    registration.active.postMessage('');
    setInterval(() => registration.active.postMessage(''), 1000 * 15);
  });
}

Additionally, per my own standard of creating workarounds and proof-of-concepts for the requests I make to specification bodies and implementers, I created a rudimentary proof-of-concept for the feature request I made here, using Native Messaging to launch a headless Chromium instance that plays audio. I included an action handler in the extension so that users can start and pkill the headless Chromium instance https://bugs.chromium.org/p/chromium/issues/detail?id=1131236#c43

manifest.json

{
  "name": "service_worker_native_messaging_headless_audio",
  "version": "1.0",
  "manifest_version": 3,
  "background": {
    "service_worker": "background.js"
  },
  "permissions": ["nativeMessaging"],
  "externally_connectable": {
    "matches": [
      "https://bugs.chromium.org/p/chromium/issues/*"
    ],
    "ids": [
      "*"
    ]
  },
  "action": {}
}

service_worker_native_messaging_headless_audio.json

{
  "name": "service_worker_native_messaging_headless_audio",
  "description": "Chromium ServiceWorker Native Messaging audio",
  "path": "/path/to/service_worker_native_messaging_headless_audio.sh",
  "type": "stdio",
  "allowed_origins": [
     "chrome-extension://<id>/"
  ]
}

background.js

chrome.action.onClicked.addListener(() => 
  chrome.runtime.sendNativeMessage('service_worker_native_messaging_headless_audio'
  , {}, (nativeMessage) => console.log({nativeMessage}))
);

service_worker_native_messaging_headless_audio.sh

#!/bin/bash
sendMessage() {
  # https://stackoverflow.com/a/24777120
  # Calculate the byte size of the string.
  # NOTE: This assumes that byte length is identical to the string length!
  # Do not use multibyte (unicode) characters, escape them instead, e.g.
  # message='"Some unicode character:\u1234"'
  messagelen=${#message}
  # Convert to an integer in native byte order.
  # If you see an error message in Chrome's stdout with
  # "Native Messaging host tried sending a message that is ... bytes long.",
  # then just swap the order, i.e. messagelen1 <-> messagelen4 and
  # messagelen2 <-> messagelen3
  messagelen1=$(( ($messagelen      ) & 0xFF ))               
  messagelen2=$(( ($messagelen >>  8) & 0xFF ))               
  messagelen3=$(( ($messagelen >> 16) & 0xFF ))               
  messagelen4=$(( ($messagelen >> 24) & 0xFF ))               
  # Print the message byte length followed by the actual message.
  printf "$(printf '\\x%x\\x%x\\x%x\\x%x' \
        $messagelen1 $messagelpen2 $messagelen3 $messagelen4)%s" "$message"
}

headless_audio() {
  if pgrep -f 'chrome --headless' > /dev/null; then
    pkill -f 'chrome --headless' & sendMessage '"Chromium headless audio off."' 
  else
    $HOME/chrome-linux/chrome --headless --autoplay-policy=no-user-gesture-required --password-store=basic --disable-gpu --remote-debugging-port=9222 audio.html & sendMessage '"Chromium headless audio on."'
  fi
}
headless_audio

audio.html

<!DOCTYPE html>
<html>
  <head> </head>
  <body>
    <audio autoplay></audio>
    <script>
      const audio = document.querySelector('audio');
      (async () => {
        const request = await fetch('https://ia801306.us.archive.org/8/items/deltanine2015-08-22.mk241_16bit/deltanine2015-08-22.mk241.cmmt30.vms32ub.dr100mkii.16bit-t06.ogg');
        const blob = await request.blob();
        const blobURL = URL.createObjectURL(blob);
        for (let url of [blobURL, 'ImperialMarch60.webm', 'house--64kbs-0-wav.wav']) {
          await new Promise((resolve) => {
            audio.src = url;
            audio.onended = () => {
              audio.onended = null;
              resolve();
            };
          });
        }
      })();
    </script>
  </body>
</html>

Basically, it is possible to keep the ServiceWorker "active" indefinitely, and thus communicate with the ServiceWorker by opening an <iframe>, Window or tab within the ServiceWorker scope then send the command to start or stop outputting audio.

Per the "5 minute" un-specified implementation on Chrome and Chromium the ServiceWorker will become "inactive" by default if the user does nothing.

I have been considering the implications and consequences of exposing AudioContext in ServiceWorker. Precedence was set by Web Speech API speechSynthesis.speak() implementations, which does not output audio in the tab itself, rather audio is output by Speech Dispatcher at the OS level, as evidenced by Chromium implementation of getDisplayMedia({audio: true, video: true}) not capturing audio output of speechSynthesis.speak() Issue 1185527 in chromium: getDisplayMedia does not capture speechSynthesis.speak() audio output https://bugs.chromium.org/p/chromium/issues/detail?id=1185527, and in some cases can survive page reload and still be outputting audio - unless speechSynthesis.cancel() is called Issue 1066812: Security: Text_To_Speech keeps playing after closing the tab https://bugs.chromium.org/p/chromium/issues/detail?id=1066812, Issue 1107210: Speech Synthesis isn't wired up to "Audio is playing" tab icons. https://bugs.chromium.org/p/chromium/issues/detail?id=1107210.

So to handle the condition described a global ServiceWorkerAudioContext.close(), ServiceWorkerAudioContext.pause(), ServiceWorkerAudioContext.resume(), and ServiceWorkerAudioContext.disconnect() can be defined which has the effect of pkill on all running ServiceWorkerAudioContext instances. We can also tailor the global functions to select specific ServiceWorkerAudioContext instances that we want to discontinue playing back or otherwise processing audio in the given ServiceWorker. This can be worked into the UI, if necessary, something like Media Capture and Streams and/or File System Access device or directory permissions, respectively; I emphasize here the programmatic means to do so.

guest271314 commented 3 years ago

I actually updated the code to keep ServiceWorker active, substituting Streams API for setInterval()

onload = async () => {
  const handleMessage = (e) => {
    parent.postMessage(e.data, '*');
  };
  onmessage = (e) => {
    if (e.data === 'abort') {
      abortable.abort();
    }
  };

  navigator.serviceWorker.addEventListener('message', handleMessage);
  const registration = await navigator.serviceWorker.ready;
  const abortable = new AbortController();
  const { signal } = abortable;
  try {
    await new ReadableStream({
      start() {
        parent.postMessage('Pipe started.', '*');
        fetch('keepalive.txt');
      },
      async pull(controller) {
        await new Promise((resolve) => setTimeout(resolve, 1000 * 15));
        controller.enqueue(null);
      },
    }).pipeTo(
      new WritableStream({
        write(value) {
          registration.active.postMessage(value);
        },
      }),
      { signal }
    );
  } catch ({message}) {
    parent.postMessage(message, '*');
    navigator.serviceWorker.removeEventListener('message', handleMessage);
    close();
  }
};

Video of the workaround using headless Chromium to play audio by utilizing ServiceWorker to communicate with Native Messaging, which is an example of UI that can be used for the case contemplated; essentially a basic icon that when clicked becomes a list of ServiceWorker(s) playing or processing audio with an "X" on the side of the item, to affirmatively stop playing or processing audio in the service worker - or if application, unregister the service worker altogether.

service_worker_native_messaging_headless_audio.webm.zip

guest271314 commented 3 years ago

If the WG simply write what needs to be written re Web IDL and exposed, then relevant extension contributors can sort out how to maintain AudioContext in ServiceWorker extensions system, and WG will not be charged with negligence for just exposing though not writing out what happens when x, y, z, occurs.

On the other hand, WG can take the lead here, which requires taking to time to test scenarios and write it out.

Either way I can effectively achieve the expected result at least one way, right now, without any specification, becuase it is not specified, while the use cases do not appear to be declining as to extended usage of AudioContext in contexts that WG has not drafted any basic algorithms or path for. Eventually I find a way to achieve the requirement I ask about. Some users in the field are actually awaiting specification authors and implementers to do stuff. I just posted this here to determine where you folks are at re this subject matter. You decide.

guest271314 commented 3 years ago

Perhaps this feature request is within the scope of Issue 897326: Low Level Audio API https://bugs.chromium.org/p/chromium/issues/detail?id=897326 where chrome.audio is referenced though is now deprecated.

This new model ensures a dedicated scope and an RT thread when allowed for the optimum WASM-powered audio processing

Comparable chrome.* APIs: chrome.audio
...

I logged the Chromium headless output to get a glipse of what is actually occurring.

I had to start a local server to use fetch() for ArrayBuffer to set at AudioBufferSourceNode to test AudioContext in headless, along with HTMLMediaElement due to Issue 810400: Fetch API does not respect --allow-file-access-from-files (even though XHR does) https://bugs.chromium.org/p/chromium/issues/detail?id=810400, where I was just using file: protocol for HTMLMediaElement

<audio>

...
[0626/192132.694631:VERBOSE1:media_stream_manager.cc(705)] MSM::InitializeMaybeAsync([this=0x1be0002cdb00])
[0626/192132.694705:VERBOSE1:media_stream_manager.cc(705)] MDM::MediaDevicesManager()
[0626/192132.694741:VERBOSE1:media_stream_manager.cc(705)] MSM::MediaStreamManager([this=0x1be0002cdb00]))
...

[0626/192132.735603:VERBOSE1:file_url_loader_factory.cc(455)] FileURLLoader::Start: file:///home/ubuntu-studio/localscripts/sw-nm-headless-audio/audio.html
[0626/192132.749611:VERBOSE1:sandbox_linux.cc(69)] Activated seccomp-bpf sandbox for process type: gpu-process.
[0626/192132.752670:VERBOSE1:device_data_manager_x11.cc(216)] X Input extension not available
[0626/192132.768283:VERBOSE1:configured_proxy_resolution_service.cc(852)] PAC support disabled because there is no system implementation
[0626/192132.768710:VERBOSE1:media_stream_manager.cc(705)] RFAOSF::Core() [process_id=4, frame_id=1]
[0626/192132.769386:VERBOSE1:configured_proxy_resolution_service.cc(852)] PAC support disabled because there is no system implementation
[0626/192132.769986:VERBOSE1:document.cc(3704)] Document::DispatchUnloadEvents() URL = <null>
[0626/192132.770205:VERBOSE1:document.cc(3784)] Actually dispatching an UnloadEvent: URL = <null>
[0626/192132.774414:VERBOSE1:media_stream_manager.cc(705)] RFAOSF::Core() [process_id=4, frame_id=1]
[0626/192132.806575:VERBOSE1:gles2_cmd_decoder.cc(3835)] GL_OES_packed_depth_stencil supported.
[0626/192132.809908:VERBOSE1:file_url_loader_factory.cc(455)] FileURLLoader::Start: file:///home/ubuntu-studio/localscripts/sw-nm-headless-audio/ImperialMarch60.wav
[0626/192132.818949:VERBOSE1:media_stream_manager.cc(705)] RFAOSF::RequestDeviceAuthorization({device_id=}) [process_id=4, frame_id=1]
[0626/192132.876218:VERBOSE1:media_stream_manager.cc(705)] RFAOSF::AuthorizationCompleted({status=OK}, {params=format: PCM_LOW_LATENCY, channel_layout: 3, channels: 2, sample_rate: 44100, frames_per_buffer: 512, effects: 0, mic_positions: }, {device_id=default}) [process_id=4, frame_id=1]
[0626/192132.876274:VERBOSE1:media_stream_manager.cc(705)] RFAOSF::AuthorizationCompleted => (authorization time=57 ms) [process_id=4, frame_id=1]
[0626/192132.880112:VERBOSE1:media_stream_manager.cc(705)] AMB::MakeAudioOutputStream({device_id=}, {params=[format: PCM_LOW_LATENCY, channel_layout: 2, channels: 1, sample_rate: 44100, frames_per_buffer: 1024, effects: 128, mic_positions: ]})
[0626/192132.880181:VERBOSE1:media_stream_manager.cc(705)] PAOS::PulseAudioOutputStream({device_id=default}, {params=[format: PCM_LOW_LATENCY, channel_layout: 2, channels: 1, sample_rate: 44100, frames_per_buffer: 1024, effects: 128, mic_positions: ]}) [this=0x3df80028bb40]
[0626/192132.880229:VERBOSE1:media_stream_manager.cc(705)] AMB::MakeAudioOutputStream => (number of streams=1)
[0626/192132.880262:VERBOSE1:media_stream_manager.cc(705)] PAOS::Open() [this=0x3df80028bb40]
[0626/192132.883929:VERBOSE1:media_stream_manager.cc(705)] audio::OS::Ctor({audio_manager_name=PulseAudio}, {device_id=default}, {params=[format: PCM_LOW_LATENCY, channel_layout: 2, channels: 1, sample_rate: 44100, frames_per_buffer: 1024, effects: 384, mic_positions: ]}) [controller=0x3DF8003D5160]
[0626/192132.884046:VERBOSE1:media_stream_manager.cc(705)] AOC::CreateStream([state=empty]) [this=0x3DF8003D5160]
[0626/192132.884078:VERBOSE1:media_stream_manager.cc(705)] AOC::RecreateStream({reason=INITIAL_STREAM}, {params=[format: PCM_LOW_LATENCY, channel_layout: 2, channels: 1, sample_rate: 44100, frames_per_buffer: 1024, effects: 384, mic_positions: ]} [state=empty]) [this=0x3DF8003D5160]
[0626/192132.884119:VERBOSE1:media_stream_manager.cc(705)] AOC::CreateStream => (state=created) [this=0x3DF8003D5160]
[0626/192132.884159:VERBOSE1:media_stream_manager.cc(705)] audio::OS::CreateAudioPipe() [controller=0x3DF8003D5160]
[0626/192132.885039:VERBOSE1:media_stream_manager.cc(705)] PAOS::Start() [this=0x3df80028bb40]
[0626/192132.885554:VERBOSE1:media_stream_manager.cc(705)] audio::OS::Play() [controller=0x3DF8003D5160]
[0626/192132.885598:VERBOSE1:media_stream_manager.cc(705)] AOC::Play([state=created]) [this=0x3DF8003D5160]
[0626/192132.885643:VERBOSE1:media_stream_manager.cc(705)] AOC::StartStream => (state=playing) [this=0x3DF8003D5160]
[0626/192133.057425:WARNING:exported_object.cc(263)] Unknown method: message_type: MESSAGE_METHOD_CALL
destination: org.mpris.MediaPlayer2.chromium.instance47698
path: /org/mpris/MediaPlayer2
interface: org.mpris.MediaPlayer2.Playlists
member: GetPlaylists
sender: :1.27
signature: uusb
serial: 6486

uint32_t 0
uint32_t 5
string "Played"
bool true

[0626/192137.885241:VERBOSE1:media_stream_manager.cc(705)] AOC::WedgeCheck => (stream is alive) [this=0x3DF8003D5160]

AudioContext

...
[0626/223935.909046:VERBOSE1:media_stream_manager.cc(705)] MSM::InitializeMaybeAsync([this=0x15e6002cdc80])
[0626/223935.909096:VERBOSE1:media_stream_manager.cc(705)] MDM::MediaDevicesManager()
[0626/223935.909122:VERBOSE1:media_stream_manager.cc(705)] MSM::MediaStreamManager([this=0x15e6002cdc80]))
...

[0626/223935.951525:VERBOSE1:sandbox_linux.cc(69)] Activated seccomp-bpf sandbox for process type: gpu-process.
[0626/223935.957686:VERBOSE1:device_data_manager_x11.cc(216)] X Input extension not available
[0626/223935.990681:VERBOSE1:media_stream_manager.cc(705)] RFAOSF::Core() [process_id=4, frame_id=1]
[0626/223935.993532:VERBOSE1:configured_proxy_resolution_service.cc(852)] PAC support disabled because there is no system implementation
[0626/223935.994321:VERBOSE1:configured_proxy_resolution_service.cc(852)] PAC support disabled because there is no system implementation
[0626/223935.996156:VERBOSE1:network_delegate.cc(34)] NetworkDelegate::NotifyBeforeURLRequest: http://localhost:8008/audio.html
[0626/223936.003610:VERBOSE1:document.cc(3704)] Document::DispatchUnloadEvents() URL = <null>
[0626/223936.003879:VERBOSE1:document.cc(3784)] Actually dispatching an UnloadEvent: URL = <null>
[0626/223936.007576:VERBOSE1:media_stream_manager.cc(705)] RFAOSF::Core() [process_id=4, frame_id=1]
[0626/223936.010035:VERBOSE1:network_delegate.cc(34)] NetworkDelegate::NotifyBeforeURLRequest: http://localhost:8008/service_worker_native_messaging_headless_audio.js
[0626/223936.022485:VERBOSE1:network_delegate.cc(34)] NetworkDelegate::NotifyBeforeURLRequest: http://localhost:8008/ImperialMarch60.wav
[0626/223936.049571:VERBOSE1:webrtc_logging.cc(32)] [WA]AC::AudioContext({latency_hint=exact}, {seconds=0.000}) [state=suspended]
[0626/223936.049783:VERBOSE1:webrtc_logging.cc(32)] [WA]AH::AudioHandler({sample_rate=0}) [type=AudioDestinationNode, this=0x187600315B00]
[0626/223936.049978:VERBOSE1:webrtc_logging.cc(32)] [WA]AD::AudioDestination({output_channels=2}) [state=stopped]
[0626/223936.050049:VERBOSE1:webrtc_logging.cc(32)] [WA]AD::AudioDestination => (FIFO size=12288 bytes) [state=stopped]
[0626/223936.050148:VERBOSE1:webrtc_logging.cc(32)] [WA]RWADI::RendererWebAudioDeviceImpl
[0626/223936.050499:VERBOSE1:media_stream_manager.cc(705)] RFAOSF::RequestDeviceAuthorization({device_id=}) [process_id=4, frame_id=1]
[0626/223936.113349:VERBOSE1:media_stream_manager.cc(705)] RFAOSF::AuthorizationCompleted({status=OK}, {params=format: PCM_LOW_LATENCY, channel_layout: 3, channels: 2, sample_rate: 44100, frames_per_buffer: 512, effects: 0, mic_positions: }, {device_id=default}) [process_id=4, frame_id=1]
[0626/223936.113412:VERBOSE1:media_stream_manager.cc(705)] RFAOSF::AuthorizationCompleted => (authorization time=62 ms) [process_id=4, frame_id=1]
[0626/223936.113736:VERBOSE1:webrtc_logging.cc(32)] [WA]RWADI::RendererWebAudioDeviceImpl => (hardware_params=[format: PCM_LOW_LATENCY, channel_layout: 3, channels: 2, sample_rate: 44100, frames_per_buffer: 512, effects: 0, mic_positions: ])
[0626/223936.113884:VERBOSE1:webrtc_logging.cc(32)] [WA]RWADI::RendererWebAudioDeviceImpl => (sink_params=[format: PCM_LOW_LATENCY, channel_layout: 3, channels: 2, sample_rate: 44100, frames_per_buffer: 512, effects: 0, mic_positions: ])
[0626/223936.113966:VERBOSE1:webrtc_logging.cc(32)] [WA]AD::AudioDestination => (device callback buffer size=512 frames) [state=stopped]
[0626/223936.114045:VERBOSE1:webrtc_logging.cc(32)] [WA]AD::AudioDestination => (device sample rate=44100 Hz) [state=stopped]
[0626/223936.114182:VERBOSE1:webrtc_logging.cc(32)] [WA]AD::AudioDestination => (no resampling: context sample rate set to 44100 Hz) [state=stopped]
[0626/223936.114432:VERBOSE1:webrtc_logging.cc(32)] [WA]AC::AudioContext => (base latency=0.012 seconds)) [state=suspended]
[0626/223936.114504:VERBOSE1:webrtc_logging.cc(32)] [WA]AC::StartRendering [state=suspended]
[0626/223936.114578:VERBOSE1:webrtc_logging.cc(32)] [WA]AD::Start [state=stopped]
[0626/223936.114644:VERBOSE1:webrtc_logging.cc(32)] [WA]RWADI::Start
[0626/223936.115069:VERBOSE1:media_stream_manager.cc(705)] RFAOSF::RequestDeviceAuthorization({device_id=}) [process_id=4, frame_id=1]
[0626/223936.116746:VERBOSE1:media_stream_manager.cc(705)] RFAOSF::AuthorizationCompleted({status=OK}, {params=format: PCM_LOW_LATENCY, channel_layout: 3, channels: 2, sample_rate: 44100, frames_per_buffer: 512, effects: 0, mic_positions: }, {device_id=default}) [process_id=4, frame_id=1]
[0626/223936.116826:VERBOSE1:media_stream_manager.cc(705)] RFAOSF::AuthorizationCompleted => (authorization time=1 ms) [process_id=4, frame_id=1]
[0626/223936.118530:VERBOSE1:media_stream_manager.cc(705)] AMB::MakeAudioOutputStream({device_id=}, {params=[format: PCM_LOW_LATENCY, channel_layout: 3, channels: 2, sample_rate: 44100, frames_per_buffer: 512, effects: 0, mic_positions: ]})
[0626/223936.118591:VERBOSE1:media_stream_manager.cc(705)] PAOS::PulseAudioOutputStream({device_id=default}, {params=[format: PCM_LOW_LATENCY, channel_layout: 3, channels: 2, sample_rate: 44100, frames_per_buffer: 512, effects: 0, mic_positions: ]}) [this=0x28780028b3c0]
[0626/223936.118627:VERBOSE1:media_stream_manager.cc(705)] AMB::MakeAudioOutputStream => (number of streams=1)
[0626/223936.118661:VERBOSE1:media_stream_manager.cc(705)] PAOS::Open() [this=0x28780028b3c0]
[0626/223936.122206:VERBOSE1:media_stream_manager.cc(705)] audio::OS::Ctor({audio_manager_name=PulseAudio}, {device_id=default}, {params=[format: PCM_LOW_LATENCY, channel_layout: 3, channels: 2, sample_rate: 44100, frames_per_buffer: 512, effects: 0, mic_positions: ]}) [controller=0x2878003D5160]
[0626/223936.122309:VERBOSE1:media_stream_manager.cc(705)] AOC::CreateStream([state=empty]) [this=0x2878003D5160]
[0626/223936.122339:VERBOSE1:media_stream_manager.cc(705)] AOC::RecreateStream({reason=INITIAL_STREAM}, {params=[format: PCM_LOW_LATENCY, channel_layout: 3, channels: 2, sample_rate: 44100, frames_per_buffer: 512, effects: 0, mic_positions: ]} [state=empty]) [this=0x2878003D5160]
[0626/223936.122368:VERBOSE1:media_stream_manager.cc(705)] AOC::CreateStream => (state=created) [this=0x2878003D5160]
[0626/223936.122397:VERBOSE1:media_stream_manager.cc(705)] audio::OS::CreateAudioPipe() [controller=0x2878003D5160]
[0626/223936.123263:VERBOSE1:webrtc_logging.cc(32)] [WA]AD::RequestRender => (rendering is now alive) [state=running]
[0626/223936.123471:VERBOSE1:media_stream_manager.cc(705)] PAOS::Start() [this=0x28780028b3c0]
[0626/223936.123468:VERBOSE1:webrtc_logging.cc(32)] [WA]RWADI::Render => (rendering is alive [frames=512])
[0626/223936.124227:VERBOSE1:media_stream_manager.cc(705)] audio::OS::Play() [controller=0x2878003D5160]
[0626/223936.124281:VERBOSE1:media_stream_manager.cc(705)] AOC::Play([state=created]) [this=0x2878003D5160]
[0626/223936.124310:VERBOSE1:media_stream_manager.cc(705)] AOC::StartStream => (state=playing) [this=0x2878003D5160]
[0626/223936.304475:VERBOSE1:webrtc_logging.cc(32)] [WA]AH::AudioHandler({sample_rate=44100}) [type=AudioBufferSourceNode, this=0x1876002D0980]
[0626/223936.304892:VERBOSE1:webrtc_logging.cc(32)] [WA]AN::connect({output=[index:0, type:AudioBufferSourceNode, handler:0x1876002D0980]} --> {input=[index:0, type:AudioDestinationNode, handler:0x187600315B00]})
[0626/223936.310042:VERBOSE1:webrtc_logging.cc(32)] [WA]AH::ProcessIfNecessary => (processing is alive [frames=128]) [type=AudioBufferSourceNode, this=0x1876002D0980]
[0626/223941.123684:VERBOSE1:media_stream_manager.cc(705)] AOC::WedgeCheck => (stream is alive) [this=0x2878003D5160]
[0626/223951.136494:VERBOSE1:media_stream_manager.cc(705)] AOC::OnMoreData => (average audio level=-18.02 dBFS) [this=0x2878003D5160]
[0626/224006.136503:VERBOSE1:media_stream_manager.cc(705)] AOC::OnMoreData => (average audio level=-20.08 dBFS) [this=0x2878003D5160]
[0626/224021.141085:VERBOSE1:media_stream_manager.cc(705)] AOC::OnMoreData => (average audio level=-18.74 dBFS) [this=0x2878003D5160]
[0626/224036.141692:VERBOSE1:media_stream_manager.cc(705)] AOC::OnMoreData => (average audio level=-37.86 dBFS) [this=0x2878003D5160]
[0626/224051.147107:VERBOSE1:media_stream_manager.cc(705)] AOC::OnMoreData => (average audio level=-inf dBFS) [this=0x2878003D5160]

Some commonality

RFAOSF::RequestDeviceAuthorization({device_id=}) [process_id=4, frame_id=1]
[0626/192132.876218:VERBOSE1:media_stream_manager.cc(705)] RFAOSF::AuthorizationCompleted({status=OK}, {params=format: PCM_LOW_LATENCY, channel_layout: 3, channels: 2, sample_rate: 44100, frames_per_buffer: 512, effects: 0, mic_positions: }, {device_id=default}) [process_id=4, frame_id=1]
[0626/192132.876274:VERBOSE1:media_stream_manager.cc(705)] RFAOSF::AuthorizationCompleted => (authorization time=57 ms) [process_id=4, frame_id=1]
[0626/192132.880112:VERBOSE1:media_stream_manager.cc(705)]

RFAOSF::RequestDeviceAuthorization({device_id=}) [process_id=4, frame_id=1]
[0626/223936.116746:VERBOSE1:media_stream_manager.cc(705)] RFAOSF::AuthorizationCompleted({status=OK}, {params=format: PCM_LOW_LATENCY, channel_layout: 3, channels: 2, sample_rate: 44100, frames_per_buffer: 512, effects: 0, mic_positions: }, {device_id=default}) [process_id=4, frame_id=1]
[0626/223936.116826:VERBOSE1:media_stream_manager.cc(705)] RFAOSF::AuthorizationCompleted => (authorization time=1 ms) [process_id=4, frame_id=1]

Therefore we should be able to stop the implementation of "media_stream_manager" on Chromium and equivalent at other implementations by "process_id" and "frame_id".

rtoy commented 3 years ago

Teleconf: As we already agreed on supporting a Worker, we'll work on that first, and ServiceWorker at a lower priority while we gather info on use cases and implications of supporting that.

guest271314 commented 3 years ago

Note: As longer as there is a 5 minute restriction for ServiceWorker that makes the context become "inactive" this feature will not be particularly useful given alternative approaches.

Due to Chromium/Chrome MV3 extenstion ServiceWorker implementation capturing system audio output, live media streams, processing data that exceeds 5 minutes are not possible without workarounds, which when I last tested, does not achieve the expected result.

I instead created a Window that is not displayed to capture system audio and specific devices without using a ServiceWorker at all https://bugs.chromium.org/p/chromium/issues/detail?id=1189678#c50.

JohnWeisz commented 3 years ago

I'd like to add in another use case here: performance in complex production apps.

If AudioContext is available in worker threads, it means that a worker thread can be dedicated specifically to schedule events on the AudioContext (param automation, creating/connecting nodes), without any interference caused by potential main thread load, most commonly caused by UI/DOM (which can be optimized by itself of course).

GeorgeTailor commented 2 years ago

how is this different from #2423 ?

guest271314 commented 2 years ago

Primarily this issue was filed for Chromium Manifest Version 3 ServiceWorker ( additionally, fetch() and WebTransport are not defined in AudioWorkletGlobalScope) to illuminate the fact that ServiceWorkers become inactive after 5 minutes without some workaround, while ServiceWorker has onfetch defined, where we should be able to stream in/from the SeriveWorker.

GeorgeTailor commented 2 years ago

As per my understanding WorkerGlobalScope is an abstraction over both Web Worker and Service Worker. #2423 requests BaseAudioContext to be exposed to WorkerGlobalScope.

guest271314 commented 2 years ago

ServiceWorkerGlobalScope is not WorkerGlobalScope. I suggest reading at least https://bugs.chromium.org/p/chromium/issues/detail?id=1131236 and https://bugs.chromium.org/p/chromium/issues/detail?id=1152255 for details as to why this issue is different, if at all, from #2423.

The last time I checked Firefox did not support Transferable Streams or Media Capture Transform https://github.com/w3c/mediacapture-transform where we can stream at least raw data from a worker thread to a non-worker thread using Streams API (e.g., Test infinite Opus stream https://plnkr.co/edit/bK1BfoSgjFUDwkIV?preview). That does not deal with 5 minute restriction baked in to ServiceWorker, were ServiceWorkers capable of rendering sound, which is technically possible using headless.

This issue, again, seeks to illuminate the fact that even if/when audio rendering is specified for ServiceWorker specifically, unless the 5 minute restriction is removed from specification spirit, implementations, and workarounds are not used, the SeriveWorker will become inactive in 5 minutes. Web Audio API and Service Worker specification authors need to work that out. The only sound conclusion from perspective here is getting rid of the 5 minute restriction, though more work is involved than just that.

guest271314 commented 2 years ago

@GeorgeTailor

how is this different from #2423 ?

One difference between Worker and ServiceWorker is the capability to serve media response to clients in onfetch handler.

guest271314 commented 2 years ago

This issue, again, seeks to illuminate the fact that even if/when audio rendering is specified for ServiceWorker specifically, unless the 5 minute restriction is removed from specification spirit, implementations, and workarounds are not used, the SeriveWorker will become inactive in 5 minutes. Web Audio API and Service Worker specification authors need to work that out. The only sound conclusion from perspective here is getting rid of the 5 minute restriction, though more work is involved than just that.

We can consistently keep ServiceWorker persistent using serveral workarounds, including FetchEvent, messaging, et al.

hoch commented 2 years ago

2022 TPAC: The Worker support for BaseAudioContext is already planned. We'll keep this issue for the future discussion for the ServiceWorker support.

guest271314 commented 2 years ago

Current options/workarounds for streaming audio from a ServiceWorker

Create a dedicated window or Tab, use MediaSession with <audio> element
Stream the same media data to all Tabs, unmute currently active Tab, mute all other Tabs
Use an independent, third-party media player, e.g., mpv in Native Messaging host to stream media at the OS level I will leave it to the readers to discuss pros/cons of each workaround.

Ideally we will fetch media in the ServiceWorker, perhaps using BackgroundFetch, which is currently not exposed in extension ServiceWorker, and use MediaSession from the ServiceWorker to control the media being played at global media controls.

guest271314 commented 2 years ago

I think the design should be something like ServiceWorker creates an AudioContext that is streamed to an audio-only PictureInPicture window.

Currently PiP window needs at least 1 frame of a video track to be written to avoid an error. So we would need cooperation from the PiP folks to either create an AudioWindow roughly equivalent to an HTMLAudioElement with controls or get rid of the video track requirement for PiP - and to be able to create such a window from the ServiceWorker.

I noticed during experimentation that when a MediaElementAudioSourceNode is connected to a MediaStreamAudioDestinationNode then streamed to a PiP window with 1 video frame written thereto just to launch the window I can actually pause playback of the live stream. I need to test more to verify.

guest271314 commented 2 years ago

Another option is to make BaseAudioContext transferable.

padenot commented 1 year ago

2023 TPAC Audio WG Discussion:

The WG will not pursue this. ServiceWorker has nothing to do with audio playback and processing. The Web Worker support is planned (https://github.com/WebAudio/web-audio-api/issues/2423).

hoch commented 1 year ago

To add:

To make this idea work, the lifetime of ServiceWorker and how it interacts with BaseAudioContext needs to be clearly defined. The WG believes that this line of work is outside the scope of the current charter.

guest271314 commented 1 year ago

If you are planning on specifying this in a Worker you might as well specify this for a ServiceWorker.

We can keep the ServiceWorker active indefinitely for the use case of an Internet radio station, or let the ServiceWorker life cycle end in ~5 minutes.

It's a shame this is being closed.

Now the implementation in a ServiceWorker will be outside of your reach, completely controlled by users hacking up SeviceWorker and Web Audio API.

voxpelli commented 1 year ago

2023 TPAC Audio WG Discussion:

The WG will not pursue this. ServiceWorker has nothing to do with audio playback and processing. The Web Worker support is planned (https://github.com/WebAudio/web-audio-api/issues/2423).

@padenot @hoch It was left open before to track this request, any suggestion of where else a request for some kind of audio support in the background can be filed? Through ServiceWorker, SharedWorker or something else.

guest271314 commented 1 year ago

@voxpelli Technically we can use a browser extension to create an offscreen document where we play audio "in the background".

Something like

chrome.action.onClicked.addListener(async(tab) => {
  if (await chrome.offscreen.hasDocument()) {
    await chrome.offscreen.closeDocument();
  }
  chrome.offscreen.createDocument({
    url: 'index.html',
    reasons: ['TESTING'],
    justification: '',
  });
});

<script type="module" src="./index.js"></script>

(async _ => {
  let workletStream = new AudioWorkletStream({
    urls: [
      'house--64kbs-0-wav',
      'house--64kbs-1-wav',
      'house--64kbs-2-wav',
      'house--64kbs-3-wav',
    ],
    latencyHint: 0,
    workletOptions: {
      numberOfInputs: 1,
      numberOfOutputs: 2,
      channelCount: 2,
      processorOptions: {
        codec: 'audio/wav',
        offset: 0
      },
    },
  });
})();

Basically this AudioWorkletStream in an offscreen document, for now.

In the ServiceWorker we can use onfetch to intercept and manipulate any data requested with import in the AudioWorkletGlobalScope by checking the destination of the reuquest to make sure it's out 'audioworklet', or when an intermediary Worker is used to make fetch requests supplying data for the AudioWorklet. For this version the only thing left is creating a UI akin to Media Session API whuch includes artists, album, artwork, etc., see sw-extension-audio that can be controlled from any Web page as the stream continues potentially indefinitely in the background.

I will also begin working on delivering fata directly from the ServiceWorker to to the underlying speakers of headphones, without necessarily using AudioContext to do that.

All options are available to hack and exploit ServiceWorker and Web Audio API to do whatever whatever we want for this use case, without any specification restrictions.

guest271314 commented 1 year ago

Rudiments of a UI in the form of a an extension popup controllable from any tab utilizing BroadcastChannel between a popup, ServiceWorker and background HTML document.

controller.html

<!doctype html>
<html>

<head>
  <script src="./controller.js"></script>
</head>

<body>
  <button>Start</button><button>Suspend</button><button>Resume</button>
</body>

</html>

controller.js

onload = async () => {
  const bc = new BroadcastChannel("offscreen");
  const [start, suspend, resume] = document.querySelectorAll("button");
  start.onclick = async () =>
    // (await navigator.serviceWorker.ready).active
    bc.postMessage("start");
  suspend.onclick = async () =>
    // (await navigator.serviceWorker.ready).active
    bc.postMessage("suspend");
  resume.onclick = async () =>
    // (await navigator.serviceWorker.ready).active
    bc.postMessage("resume");
};

background.js (ServiceWorker)

const bc = new BroadcastChannel("offscreen");
bc.onmessage = async (e) => {
  if (e.data === "start") {
    if (await chrome.offscreen.hasDocument()) {
      await chrome.offscreen.closeDocument();
    }
    return await chrome.offscreen.createDocument({
      url: "index.html",
      reasons: ["TESTING"],
      justification: "",
    });
  }
  bc.postMessage(e.data);
};

oninstall = async (event) => {
  console.log(event);
  event.waitUntil(self.skipWaiting());
};

onactivate = async (event) => {
  console.log(event);
  event.waitUntil(self.clients.claim());
};

onfetch = async (e) => {
  console.log(e.request.url, e.request.destination);
};
// ...

index.js (offscreen document)

globalThis.bc = new BroadcastChannel("offscreen");
bc.onmessage = async (e) => {
  console.log(e.data);
  if (e.data === "resume") await workletStream.ac.resume();
  if (e.data === "suspend") await workletStream.ac.suspend();
};

globalThis.workletStream = new AudioWorkletStream({...});

Next we will experiment with stream audio from the ServiceWorker to the a local audio application or sound server, e.g., MPV, and PulseAudio.

hoch commented 1 year ago

@voxpelli The roadmap for this work is unclear to me because of the complexity of the interaction between ServiceWorker and AudioContext. I think we need at least two things to reopen and reprioritize this issue in the Audio WG:

Is the working group responsible for ServiceWorker interested in this work? This line of work will produce significant change/addition to both specs.
Do we have a strong demand from developers who have real-world use cases?

Also, closing this issue only means that the WG has other priorities. When the priority changes in the future, the group will definitely reopen this and invite more opinions.

@padenot Please feel free to add any other rationales you have in mind.

padenot commented 1 year ago

I generally agree with what @hoch said.

Audio playback on the web is fundamentally tied to an active document for a host of reasons, and this would be a fundamental change to how a User-Agent is expected to behave by its users.

To take a parallel, on mobile or desktop, in native, there's always something (an app, a command-line program, a widget, or sometimes the system itself in the case of a notification) that is the cause of the sound, and the user can understand what's going on and can easily interact with the thing that is making sound, for example to pause the audio playback.

I'm not saying it's impossible, but rather than there are significant challenges to overcome before this can be worked on.

It's also unclear to me what use-cases this would solve.

guest271314 commented 1 year ago

It's also unclear to me what use-cases this would solve.

Background Internet radio station. The idea is users supply the audio to an indefinite playback stream, controlled in the ServiceWorker.
Background speech synthesis and speech recognition. This basically already happens with network requests in some browser. We can do this in the background ServiceWorker using local files, where applicable.

We have a detached media stream not necessarily tied to any document with MediaStreamTrackGenerator and MediaStreamTrackProcessor.

This would allow us to pipe to the headphones or speakers in the ServiceWorker. Leaving the DOM to the DOM.

In a browser extension this means navigation to various Web sites, without needing to keep a document open, somewhere, just to play audio. With Media Session API the user can control the playback, change channels, etc. in a UI on the browser toolbar, without having to keep a dedicated Tab open, somewhere, rather offscreen document, or iframe, or window just to play or manipulate audo signals. Keeping those documents open costs, perhaps minimally, but for historic puposes only. The technology exists to implement this now.

In a non-extension ServiceWorker the control of the audio is delegated to the ServiceWorker context, so users who want to listen to a media stream in the background can. Potentially indefinitely, with the ServiceWorker fetching and queuing up streams for as long as the user is navigating the site.

I don't think anything needs to change besides doing whatever IDL and exposing AudioContext in the ServiceWorkerGlobalScope. Developers can do what they do from there.

If you think people are going to ask questions you can do something like navigator.permissions.request({name: 'service_worker_audio_context'}). "Grant AudioContext permission for ServiceWorker from origin 'protocol://address'".

We can have dual opt-in, from document and ServiceWorker. That would require some ServiceWorker involvement.

I don't think there are much challenges. Other than the will to experiment and break out of boxes.

padenot commented 1 year ago

Extensions have documents and ways to interact with the code that runs via widgets. MediaStreamTrackGenerator and MediaStreamTrackProcessor are always tied to a document. They can be instantiated in a Worker, but a Worker lifetime is tied to the life-time of its document.

Media Session API requires a document and the lifetime is tied to this document. All of this is a lifetime problem.

If you want to be able to get a tab out of the way, ask the browser vendor to provide a way to do so. This has nothing to do with audio playback from a service worker.

guest271314 commented 1 year ago

Well, a document os required to create a non-extension ServiceWorker, so that is not novel.

WebCodec's AudioData, AudioEncoder and AudioDecoder are defined in DedicatedWorkerGlobalScope.

I find it amazing that folks in the audio realm are so close minded ablout experientation. Nothing can go wrong here by providing a means to connect to speakers from the ServiceWorker.

For now I suppose I have to create a proof-of-concept by directly streaming audio from the ServiceWorker to Pulse Audio or Pipewire to demonstrate the Internet s not going to break just because AudioContext is exposed in a ServiceWorker and controlled on any open tab.

yume-chan commented 1 year ago

Not being able to use AudioContext in ServiceWorker gives me the impression that the standards team doesn't want web apps to have the same capabilities as traditional Desktop apps.

For example, in a web-based music streaming app, I can use BroadcastChannel, SharedWorker or ServiceWorker to make multiple tabs to communicate and coordinate with each other, like displaying the same now-playing information and controlling the playback from any tabs.

But the playback itself must happen in one of the tabs, if the user closes that tab, the playback stops. I can't resume it in another tab because starting an AudioContext requires user activation (even if I can, this make the code more complex, and might have a short pause between switching). This creates an imperfect experience for users: "Why can I close all these tabs without affecting the playback, but I can't close that one? On desktop version of [insert music app here], I can close all its windows and the music will continue playing".

I'm not talking about "retain service worker and play audio from it even if it has 0 clients/documents", only allowing tabs to be freely closed/reloaded (because service worker can live through reloading last tab) would be a huge improvement.

Similar story for Media Session API. I have used https://github.com/sinono3/souvlaki to integrate OS media control in an Electron app, for all of Windows, Linux and macOS, they don't need a window to have media session displayed and controlled.

WebAudio / web-audio-api

Expose AudioContext in Worker/ServiceWorker #2383