= WebM muxer for WebCodecs {nbsp}{nbsp}{nbsp} image:https://github.com/davedoesdev/webm-muxer.js/workflows/ci/badge.svg[CI status,link=https://github.com/davedoesdev/webm-muxer.js/actions]
== Description
https://www.w3.org/TR/webcodecs/[WebCodecs] encodes audio and video but leaves multiplexing the output into a container to the application. This project provides a way for your application to get its audio and video tracks into a WebM container.
Under the hood it uses https://github.com/webmproject/libwebm/[libwebm] and https://github.com/webmproject/webm-tools/[webmtools] compiled to Web Assembly using https://emscripten.org/[Emscripten].
For a pure-TypeScript alternative to this project that doesn't use Web Assembly, see https://github.com/Vanilagy/webm-muxer[webm-muxer].
== Requirements
webm-muxer.js works on Chrome 95. The WebCodecs spec changes frequently so changes may be required to maintain support going forward.
== Licence
webm-muxer.js is licensed under the terms of the link:LICENCE[MIT licence].
== Demo
You can see a demo http://rawgit.davedoesdev.com/davedoesdev/webm-muxer.js/main/demo.html[here]
(tested on Chrome 95 Linux with #enable-experimental-web-platform-features
).
The source code for the demo is available in link:demo.html[] and link:demo.js[].
When you click the Start button, you'll be asked by the browser to give permission to capture your camera and microphone. The data from each is then passed to two separate workers which encode the video into VP9 or AV1 and audio into Opus using the WebCodecs browser API.
The encoded video and audio from each worker is passed into a third worker which muxes it into WebM format.
The WebM output from the third worker is then passed into a <video>
element via
https://developer.mozilla.org/en-US/docs/Web/API/MediaSource[`MediaSource`] so you can see
it on the page.
When you click the Stop button, the camera and microphone are closed and the workers will exit once they've processed the last of the data.
This all happens in near realtime — the data is effectively pipelined between the workers and you don't have to wait until you press Stop to see the output.
You also have the option to record the WebM file to your local disk.
If you check Record before clicking Start, you'll be prompted for a file to save
the data into (default camera.webm
). The data is written to disk as it is produced and
then renamed to the file you chose once you click Stop. If you check In-memory too,
the data will instead be buffered in memory first and then saved to camera.webm
once you click Stop.
Finally, when you record, you can check PCM to have the raw audio data from your microphone
passed into the WebM muxer rather than encoding it to Opus. This option is only available when recording
because the <video>
element doesn't support PCM playback from WebM files — you won't
be able to monitor the video in this case.
== Integration into your application
In your application, there are two Javascript files which you should run in Web Workers:
link:encoder-worker.js[]:: Takes output from https://w3c.github.io/mediacapture-transform/#track-processor[`MediaStreamTrackProcessor] and encodes it using WebCodecs https://www.w3.org/TR/webcodecs/#videoencoder-interface[
VideoEncoder] or https://www.w3.org/TR/webcodecs/#audioencoder-interface[
AudioEncoder]. You should run this in up to two Workers, one for video and one for audio. If you have only video or audio, run
encoder-worker.js` in one worker.
link:webm-worker.js[]:: Takes output from encoder-worker.js
and muxes it into WebM container format.
You'll also need to copy link:webm-muxer.js[] and link:webm-muxer.wasm[] to your application because webm-worker.js
uses them.
Your application should in general follow the procedure described below.
. Create your https://www.w3.org/TR/mediacapture-streams/#mediastreamtrack[`MediaStreamTrack`]s (e.g. using https://www.w3.org/TR/mediacapture-streams/#dom-mediadevices-getusermedia[`getUserMedia()`]). You can have both video and audio tracks or just one.
. Create up to two https://w3c.github.io/mediacapture-transform/#track-processor[`MediaStreamTrackProcessor`]s, one for each of your tracks.
. Create a Web Worker using webm-worker.js
.
. Create up to two Web Workers using encoder-worker.js
, one for each of your tracks.
. When your application receives a message from the Worker running webm-worker.js
:
type
property is start-stream
then:** If you have a video track, send a message to the Worker running encoder-worker.js
for video with the following properties:
[horizontal]
type
:: "start"
readable
:: The readable
property of the MediaStreamTrackProcessor
for the video. You'll need to transfer this to the worker.
key_frame_interval
:: How often to generate a key frame, in seconds. Use 0
for no key frames.
count_frames
:: Use frame count rather than timestamp to determine key frame.
config
:: The https://www.w3.org/TR/webcodecs/#dictdef-videoencoderconfig[`VideoEncoderConfig`] for encoding the video.
** If you have an audio track, send a message to the Worker running encoder-worker.js
for audio with the following properties:
[horizontal]
type
:: "start"
audio
:: true
readable
:: The readable
property of the MediaStreamTrackProcessor
for the audio. You'll need to transfer this to the worker.
config
:: The https://www.w3.org/TR/webcodecs/#dictdef-audioencoderconfig[`AudioEncoderConfig`] for encoding the audio.
type
property is muxed-data
then:** The message's data
property contains the next chunk of the WebM output as an
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer[`ArrayBuffer`],
which your application can use as it likes.
If the message's type
property is error
then a muxing error occurred and the detail
property contains the error description.
If the message's type
property is exit
then the muxer has finished (all the tracks have finished,
the muxer has flushed its buffers and sent back all the muxed data).
[[stats]] If the message's type
property is stats
then the data
property contains an object with
the following property:
[horizontal]
memory
:: Size of the Emscripten/Web Assembly heap. This may grow for long-lived sessions.
. When your application receives a message from one of the Workers running encoder-worker.js
:
If the message's type
property is error
then an encoding error occurred and the detail
property contains the error description.
If the message's type
property is exit
then the encoder has finished (its track ended).
Otherwise send the message onto the Worker running webm-worker.js
. You should transfer the data
property.
. Send a message to the Worker running webm-worker.js
with the following properties:
[horizontal]
type
:: "start"
webm_metadata
:: An object with the following properties:
+
[horizontal]
max_cluster_duration
::: Desired length in nanoseconds of each WebM output chunk. Use a BigInt
to specify this.
video
::: If you have a video track, an object with the following properties:
+
[horizontal]
width
:::: Width of the encoded video in pixels.
height
:::: Height of the encoded video in pixels.
frame_rate
:::: Number of frames per second in the video. This property is optional.
codec_id
:::: WebM codec ID to describe the video encoding method, e.g. "V_VP9"
, "V_AV1"
or "V_MPEG4/ISO/AVC"
. See the https://www.matroska.org/technical/codec_specs.html[codec mappings page] for more values.
audio
::: If you have an audio track, an object with the following properties:
+
[horizontal]
sample_rate
:::: Number of audio samples per second in the encoded audio.
channels
:::: Number of channels in the encoded audio.
bit_depth
:::: Number of bits in each sample. This property is usually used only for PCM encoded audio.
codec_id
:::: WebM codec ID to describe the audio encoding method, e.g. "A_OPUS"
or "A_PCM/FLOAT/IEEE"
. See the https://www.matroska.org/technical/codec_specs.html[codec mappings page] for more values.
+
webm_options
:: An object with the following properties:
+
[horizontal]
video_queue_limit
:::: The number of video frames to buffer while waiting for audio with a later timestamp to arrive.
+
Defaults to Infinity
, i.e. all data is muxed in timestamp order, which is suitable if you
have continuous data. However, if you have intermittent audio or video, including delayed start
of one with respect to the other, then you can try setting video_queue_limit
to a small value.
+
For example, if your video is 30fps then setting video_queue_limit
to 30
will buffer a
maximum of one second of video while waiting for audio. If audio subsequently arrives that
has a timestamp earlier than the video, its timestamp is modified in order to maintain
a monotonically increasing timestamp in the muxed output. This may result in the audio sounding
slower.
+
In general, if your audio and video is continuous and start at the same time, leave video_queue_limit
at the default. Otherwise, the lower you set it, the more accurate the first audio timestamp in the muxed
output will be, but subsequent audio timestamps may be altered. The higher you set it, the less accurate
the first audio timestamp will be but subsequent audio timestamps are less likely to be altered.
This is because WebCodecs provides no way of synchronizing media streams — in fact audio and video
timestamps are completely unrelated to each other. So we have to base everything off initial arrival
time in the muxer.
audio_queue_limit
:::: The number of audio frames to buffer while waiting for video with a later timestamp to arrive.
+
Same as video_queue_limit
but for audio.
use_audio_timestamps
:::: Always use timestamps in the encoded audio data rather than calculate them from
the duration of each audio chunk.
+
Defaults to false
, i.e. the timestamp of an audio chunk is set to sum of the durations of all the preceding
audio chunks. This is suitable for continuous audio but if you have intermittent audio, set this to false
.
+
Note that I've found the duration method to be more accurate than the timestamps WebCodecs generates.
+
webm_stats_interval
:: If you specify this then the worker will repeately send a message with type
property set to stats
. The interval between each message will be the number of milliseconds specified.
See <stats
messages.
. To stop muxing cleanly, wait for exit messages from all the Workers running encoder-worker.js
and then send a message to the Worker running webm-worker.js
with the following property:
[horizontal]
type
:: "end"
== Output
Per above, your application will receive chunked WebM output in multiple type: "muxed-data"
messages from the Worker running webm-worker
.
These are suitable for live streaming but if you concatenate them, for example to record them to a file, please be aware that the result will not be seekable.
You can use link:webm-writer.js[] to make the WebM data seekable. It exports a class, WebMWriter
,
which uses one of two methods to index muxed data:
Index as it goes:: Writes the data to disk as it's produced, using the https://developer.mozilla.org/en-US/docs/Web/API/File_System_Access_API[File System Access API]. Once the data stops, appends the cues, seeks back to the start of the file and rewrites the header. To use this method:
. Construct a WebMWriter
object. The constructor takes an optional options object with a single property,
metadata_reserve_size
. This is the number of extra bytes to leave at the start of the file for the header
so it can be fixed up after writing stops. The default is 1024, which is enough to rewrite the header.
WebMWriter
will try to put the cues into this space too if they're small enough, otherwise they're appended
to the end of the file, after the track data. You can increase metadata_reserve_size
to leave more space
for the cues at the start of the file, but remember the longer the recording, the larger the cues section will be.
. Call the async start
method. You must pass a filename argument to this function, otherwise the data is buffered in memory
(see below). The user is prompted to for the file to save the data into -- the argument passed to start
is used as
the suggested name in the file picker.
. Call the async write
method for each type: "muxed-data"
message, passing it the data
property of the message.
. Call the async finish
method. Once this returns (after awaiting), the seekable WebM file will be ready in the
file the user chose. Note the name
property of the WebMWriter
object will contain the filename (but not the path).
The handle
property will contain the https://developer.mozilla.org/en-US/docs/Web/API/FileSystemFileHandle[`FileSystemFileHandle] to the file. You can use this to read it back in again if you need to.
finishreturns
trueif the cues were inserted at the start of the file or
false` if they were appended at the end.
Buffer in memory:: Buffers the data in memory and then rewrites the header and cues. The cues are always inserted at the start, before the track data. To use this method:
. Construct a WebMWriter
object.
. Call the async start
method.
. Call the async write
method for each type: "muxed-data"
message, passing it the data
property of the message.
. Call the async finish
method. This returns (after awaiting) an array of
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer[`ArrayBuffer`]s or
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray[typed arrays] containing
the seekable WebM recording split into contiguous chunks.
After finish
returns in both methods, the size
property of the WebMWriter
object will contain the size of the file
in bytes and the duration
property will contain the length of the recording in milliseconds.
See link:demo.js[the demo] for an example of how to use WebMWriter
.
WebMWriter
uses link:EBML.js[] to do the heavy lifting. EBML.js
is copied from https://github.com/davedoesdev/ts-ebml/blob/update-deps/dist/EBML.js[my fork] of https://github.com/legokichi/ts-ebml[ts-ebml].