chaseleslie / canvas_capture

A WebExtension to record video from HTML canvas elements
GNU General Public License v3.0
17 stars 3 forks source link

Audio capture #9

Open JonnyTech opened 5 years ago

JonnyTech commented 5 years ago

Does an option exist to capture audio together with the canvas video? Or is there an easily do it? Otherwise can I request it as a new feature please.

JonnyTech commented 5 years ago

Seems like I am not the only person that would like to see this implemented, see reviews on Mozills's extensions page: https://addons.mozilla.org/en-GB/firefox/addon/canvas-capture/reviews/1053973/

chaseleslie commented 5 years ago

Currently, there is no option to capture audio. I don't know of an API that would allow capturing any arbitrary audio output on a page. There appears to be a mechanism to get the audio output from an Audio or Video element, but there's no guarantee these will be attached to the DOM (i.e., accessible by an injected Web Extension script). Since scripts added by extensions are sandboxed from page scripts, there is also no way to search for detached media elements (e.g., new Audio() or AudioContext).

I suppose either Web Extensions or HTML5/Web Audio will have to add this capability before this can be done. If it becomes possible, or someone has an idea on how to implement it, I will gladly add this feature.

JonnyTech commented 5 years ago

Thanks for your reply. Is it possibe to add a simultaneous MediaRecorder to the script to produce a seperate audio file? Details here https://zhirzh.github.io/2017/09/02/mediarecorder/ and example here: https://github.com/mdn/web-dictaphone

chaseleslie commented 5 years ago

In theory it is possible. However this won't work reliably for a few reasons.

First, as I mentioned before, a Web Extension doesn't have access to the web page's scripts, so the extension can only query media elements that are present in the DOM. A page can have detached media elements, like calling const audioPlayer = new Audio(); (see here). The extension has no way of getting a reference to that media element unless it is added to the DOM, which isn't necessary for it to be played. A reference would be necessary in order to call the captureStream() method in order to create a MediaRecorder from it.

Second, the current support in browsers for capturing a stream from media elements is lacking. Firefox only supports capturing a stream from Audio/Video elements if their source is a MediaStream, which would not allow capturing a stream of say a MP3 file or the audio track from a video (see here). The captureStream() method of media elements using EME (aka DRM) also doesn't work.

Third, the page may be generating audio on the fly using the Web Audio API (a game or music app might do this, for instance). I don't know of a way to intercept the stream in this scenario.

The second link you posted shows a demo of getting the audio from a user's microphone. While this is possible, I'm not sure if that is what you are interested in.

Hopefully Firefox will bring their implementation of HTMLMediaElement::captureStream() up to spec soon. It might be worth implementing an audio capture at that point, even if it can't capture audio from all possible sources. Or maybe there will be a way once the AudioWorklet API is up and running.

That being said, I do like the idea and it would make the extension more useful. It would be nice to be able to capture a WebGL game and it's sound effects at once.

JonnyTech commented 5 years ago

Ah, I understand the complications now, thanks for clarifying.

When using the example, I am prompted to select a source:

audiosource

Selecting the monitor captures any and all audio playing. This workaround would help me immensely with my archiving task.

I have tried adding some code to yours but cannot test it because I am unsure how to create a firefox extension. Can you provide any tips?

chaseleslie commented 5 years ago

Very interesting, the monitor option does appear to capture audio playing in another tab or application.

What OS are you testing this on? I can capture audio on Ubuntu, but the demo is not working on Windows 10. The name "Monitor of XYZ" sounds suspiciously like pulseaudio. I wonder if other operating systems expose something like this.

As for creating extensions, there is extensive documentation available at MDN. You can temporarily load an extension by going to about:debugging (Click Load Temporary Add-on). Just go to the extension source code folder and select the manifest.json file.

In this extension in particular, I made some tools (see the file tools/USAGE.md) to help with the differences in platforms (Chrome/Firefox). These tools use bash and python scripts and assume a linux-like environment. While developing I usually run the script tools/monitor.sh, which will update the files in the platform-specific directories anytime a change is made. So if testing on Firefox, the platform-specific source code will be located in the directory platform/firefox-dev. If you just want to test some quick changes, you should have no problem just loading the manifest.json file in the top-level extension directory (the platform-specific differences are minimal at this point).

I'll look into getting the getUserMedia() API to work on other operating systems. I don't want to implement this if it only works with certain sound servers (pulseaudio), but if it does work then this looks like the best way to record audio along with the canvas.

JonnyTech commented 5 years ago

Running Firefox on Debian, sorry no Windows here. Thanks for the details, very helpful. Could code detect the OS and offer audio options accordingly?

chaseleslie commented 5 years ago

Further testing on Win10 revealed that the call to getUserMedia fails unless a microphone is plugged in. So I guess Windows doesn't expose audio sinks as audio sources like pulse does. I don't have a Mac or BSD lying around to see how they fare.

OS detection is possible but probably not the right way of going about it. Code could just ask for permission to record audio (in response to a button click, perhaps), and only display that part of the UI if permission is granted. Or always show that part of the UI and keep it disabled unless permission is granted (this might be confusing though).

JonnyTech commented 5 years ago

Ah, that is annoying. Are you able to produce a test build? I can test on a variety of devices during the week.