davedoesdev/webm-muxer.js

= WebM muxer for WebCodecs {nbsp}{nbsp}{nbsp} image:https://github.com/davedoesdev/webm-muxer.js/workflows/ci/badge.svg[CI status,link=https://github.com/davedoesdev/webm-muxer.js/actions]

== Description

https://www.w3.org/TR/webcodecs/[WebCodecs] encodes audio and video but leaves multiplexing the output into a container to the application. This project provides a way for your application to get its audio and video tracks into a WebM container.

Under the hood it uses https://github.com/webmproject/libwebm/[libwebm] and https://github.com/webmproject/webm-tools/[webmtools] compiled to Web Assembly using https://emscripten.org/[Emscripten].

For a pure-TypeScript alternative to this project that doesn't use Web Assembly, see https://github.com/Vanilagy/webm-muxer[webm-muxer].

== Requirements

webm-muxer.js works on Chrome 95. The WebCodecs spec changes frequently so changes may be required to maintain support going forward.

== Licence

webm-muxer.js is licensed under the terms of the link:LICENCE[MIT licence].

== Demo

You can see a demo http://rawgit.davedoesdev.com/davedoesdev/webm-muxer.js/main/demo.html[here] (tested on Chrome 95 Linux with #enable-experimental-web-platform-features). The source code for the demo is available in link:demo.html[] and link:demo.js[].

When you click the Start button, you'll be asked by the browser to give permission to capture your camera and microphone. The data from each is then passed to two separate workers which encode the video into VP9 or AV1 and audio into Opus using the WebCodecs browser API.

The encoded video and audio from each worker is passed into a third worker which muxes it into WebM format.

The WebM output from the third worker is then passed into a <video> element via https://developer.mozilla.org/en-US/docs/Web/API/MediaSource[`MediaSource`] so you can see it on the page.

When you click the Stop button, the camera and microphone are closed and the workers will exit once they've processed the last of the data.

This all happens in near realtime — the data is effectively pipelined between the workers and you don't have to wait until you press Stop to see the output.

You also have the option to record the WebM file to your local disk. If you check Record before clicking Start, you'll be prompted for a file to save the data into (default camera.webm). The data is written to disk as it is produced and then renamed to the file you chose once you click Stop. If you check In-memory too, the data will instead be buffered in memory first and then saved to camera.webm once you click Stop.

Finally, when you record, you can check PCM to have the raw audio data from your microphone passed into the WebM muxer rather than encoding it to Opus. This option is only available when recording because the <video> element doesn't support PCM playback from WebM files — you won't be able to monitor the video in this case.

== Integration into your application

In your application, there are two Javascript files which you should run in Web Workers:

link:encoder-worker.js[]:: Takes output from https://w3c.github.io/mediacapture-transform/#track-processor[`MediaStreamTrackProcessor] and encodes it using WebCodecs https://www.w3.org/TR/webcodecs/#videoencoder-interface[VideoEncoder] or https://www.w3.org/TR/webcodecs/#audioencoder-interface[AudioEncoder]. You should run this in up to two Workers, one for video and one for audio. If you have only video or audio, runencoder-worker.js` in one worker.

link:webm-worker.js[]:: Takes output from encoder-worker.js and muxes it into WebM container format.

You'll also need to copy link:webm-muxer.js[] and link:webm-muxer.wasm[] to your application because webm-worker.js uses them.

Your application should in general follow the procedure described below.

. Create your https://www.w3.org/TR/mediacapture-streams/#mediastreamtrack[`MediaStreamTrack`]s (e.g. using https://www.w3.org/TR/mediacapture-streams/#dom-mediadevices-getusermedia[`getUserMedia()`]). You can have both video and audio tracks or just one.

. Create up to two https://w3c.github.io/mediacapture-transform/#track-processor[`MediaStreamTrackProcessor`]s, one for each of your tracks.

. Create a Web Worker using webm-worker.js.

. Create up to two Web Workers using encoder-worker.js, one for each of your tracks.

. When your application receives a message from the Worker running webm-worker.js:

If the message's type property is start-stream then:

** If you have a video track, send a message to the Worker running encoder-worker.js for video with the following properties: [horizontal] type:: "start" readable:: The readable property of the MediaStreamTrackProcessor for the video. You'll need to transfer this to the worker. key_frame_interval:: How often to generate a key frame, in seconds. Use 0 for no key frames. count_frames:: Use frame count rather than timestamp to determine key frame. config:: The https://www.w3.org/TR/webcodecs/#dictdef-videoencoderconfig[`VideoEncoderConfig`] for encoding the video.

** If you have an audio track, send a message to the Worker running encoder-worker.js for audio with the following properties: [horizontal] type:: "start" audio:: true readable:: The readable property of the MediaStreamTrackProcessor for the audio. You'll need to transfer this to the worker. config:: The https://www.w3.org/TR/webcodecs/#dictdef-audioencoderconfig[`AudioEncoderConfig`] for encoding the audio.

If the message's type property is muxed-data then:

** The message's data property contains the next chunk of the WebM output as an https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer[`ArrayBuffer`], which your application can use as it likes.

If the message's type property is error then a muxing error occurred and the detail property contains the error description.
If the message's type property is exit then the muxer has finished (all the tracks have finished, the muxer has flushed its buffers and sent back all the muxed data).
[[stats]] If the message's type property is stats then the data property contains an object with the following property: [horizontal] memory:: Size of the Emscripten/Web Assembly heap. This may grow for long-lived sessions.

. When your application receives a message from one of the Workers running encoder-worker.js:

If the message's type property is error then an encoding error occurred and the detail property contains the error description.
If the message's type property is exit then the encoder has finished (its track ended).
Otherwise send the message onto the Worker running webm-worker.js. You should transfer the data property.

. Send a message to the Worker running webm-worker.js with the following properties: [horizontal] type:: "start" webm_metadata:: An object with the following properties: + [horizontal] max_cluster_duration::: Desired length in nanoseconds of each WebM output chunk. Use a BigInt to specify this. video::: If you have a video track, an object with the following properties: + [horizontal] width:::: Width of the encoded video in pixels. height:::: Height of the encoded video in pixels. frame_rate:::: Number of frames per second in the video. This property is optional. codec_id:::: WebM codec ID to describe the video encoding method, e.g. "V_VP9", "V_AV1" or "V_MPEG4/ISO/AVC". See the https://www.matroska.org/technical/codec_specs.html[codec mappings page] for more values. audio::: If you have an audio track, an object with the following properties: + [horizontal] sample_rate:::: Number of audio samples per second in the encoded audio. channels:::: Number of channels in the encoded audio. bit_depth:::: Number of bits in each sample. This property is usually used only for PCM encoded audio. codec_id:::: WebM codec ID to describe the audio encoding method, e.g. "A_OPUS" or "A_PCM/FLOAT/IEEE". See the https://www.matroska.org/technical/codec_specs.html[codec mappings page] for more values. + webm_options:: An object with the following properties: + [horizontal] video_queue_limit:::: The number of video frames to buffer while waiting for audio with a later timestamp to arrive. + Defaults to Infinity, i.e. all data is muxed in timestamp order, which is suitable if you have continuous data. However, if you have intermittent audio or video, including delayed start of one with respect to the other, then you can try setting video_queue_limit to a small value. + For example, if your video is 30fps then setting video_queue_limit to 30 will buffer a maximum of one second of video while waiting for audio. If audio subsequently arrives that has a timestamp earlier than the video, its timestamp is modified in order to maintain a monotonically increasing timestamp in the muxed output. This may result in the audio sounding slower. + In general, if your audio and video is continuous and start at the same time, leave video_queue_limit at the default. Otherwise, the lower you set it, the more accurate the first audio timestamp in the muxed output will be, but subsequent audio timestamps may be altered. The higher you set it, the less accurate the first audio timestamp will be but subsequent audio timestamps are less likely to be altered. This is because WebCodecs provides no way of synchronizing media streams — in fact audio and video timestamps are completely unrelated to each other. So we have to base everything off initial arrival time in the muxer. audio_queue_limit:::: The number of audio frames to buffer while waiting for video with a later timestamp to arrive. + Same as video_queue_limit but for audio. use_audio_timestamps:::: Always use timestamps in the encoded audio data rather than calculate them from the duration of each audio chunk. + Defaults to false, i.e. the timestamp of an audio chunk is set to sum of the durations of all the preceding audio chunks. This is suitable for continuous audio but if you have intermittent audio, set this to false. + Note that I've found the duration method to be more accurate than the timestamps WebCodecs generates. + webm_stats_interval:: If you specify this then the worker will repeately send a message with type property set to stats. The interval between each message will be the number of milliseconds specified. See <> for details of stats messages.

. To stop muxing cleanly, wait for exit messages from all the Workers running encoder-worker.js and then send a message to the Worker running webm-worker.js with the following property: [horizontal] type:: "end"

== Output

Per above, your application will receive chunked WebM output in multiple type: "muxed-data" messages from the Worker running webm-worker.

These are suitable for live streaming but if you concatenate them, for example to record them to a file, please be aware that the result will not be seekable.

You can use link:webm-writer.js[] to make the WebM data seekable. It exports a class, WebMWriter, which uses one of two methods to index muxed data:

Index as it goes:: Writes the data to disk as it's produced, using the https://developer.mozilla.org/en-US/docs/Web/API/File_System_Access_API[File System Access API]. Once the data stops, appends the cues, seeks back to the start of the file and rewrites the header. To use this method:

. Construct a WebMWriter object. The constructor takes an optional options object with a single property, metadata_reserve_size. This is the number of extra bytes to leave at the start of the file for the header so it can be fixed up after writing stops. The default is 1024, which is enough to rewrite the header. WebMWriter will try to put the cues into this space too if they're small enough, otherwise they're appended to the end of the file, after the track data. You can increase metadata_reserve_size to leave more space for the cues at the start of the file, but remember the longer the recording, the larger the cues section will be.

. Call the async start method. You must pass a filename argument to this function, otherwise the data is buffered in memory (see below). The user is prompted to for the file to save the data into -- the argument passed to start is used as the suggested name in the file picker.

. Call the async write method for each type: "muxed-data" message, passing it the data property of the message.

. Call the async finish method. Once this returns (after awaiting), the seekable WebM file will be ready in the file the user chose. Note the name property of the WebMWriter object will contain the filename (but not the path). The handle property will contain the https://developer.mozilla.org/en-US/docs/Web/API/FileSystemFileHandle[`FileSystemFileHandle] to the file. You can use this to read it back in again if you need to.finishreturnstrueif the cues were inserted at the start of the file orfalse` if they were appended at the end.

Buffer in memory:: Buffers the data in memory and then rewrites the header and cues. The cues are always inserted at the start, before the track data. To use this method:

. Construct a WebMWriter object.

. Call the async start method.

. Call the async write method for each type: "muxed-data" message, passing it the data property of the message.

. Call the async finish method. This returns (after awaiting) an array of https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer[`ArrayBuffer`]s or https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray[typed arrays] containing the seekable WebM recording split into contiguous chunks.

After finish returns in both methods, the size property of the WebMWriter object will contain the size of the file in bytes and the duration property will contain the length of the recording in milliseconds.

See link:demo.js[the demo] for an example of how to use WebMWriter.

WebMWriter uses link:EBML.js[] to do the heavy lifting. EBML.js is copied from https://github.com/davedoesdev/ts-ebml/blob/update-deps/dist/EBML.js[my fork] of https://github.com/legokichi/ts-ebml[ts-ebml].

davedoesdev / webm-muxer.js

readme