By: Yash Khandelwal & Greg Whitworth
Currently editing video within the browser is a very complex task as there isn't any straight forward approach to decoding the encoded video file to produce a raw stream that can do common video editing capabilities such as trimming or concatenation. Normally, web developers will do client side editing in three potential ways:
All of these approaches have their pros and cons, the first one requires either knowing the video formats that the application will be working with or bundle a full zipped version of library, such as ffmpeg, in WASM to handle multiple codecs. This can result in large file sizes (at times up to 7MB zipped) to enable client side editing. This does however unlock all of the necessary needs of trimming and concatenation.
With the second approach, this likewise has the benefit of being able to handle the use cases denoted above without having to download the larger files. The negative implications of this approach is that the server side solution can produce bottlenecks in an editing queue and costs associated with having dedicated servers for doing the video edits. Additionally, this may result in numerous redundant edits in the queue since upon saving it adds the editing to the queue. This can lead to increased server side costs and a slow turn around time for the end user.
The final approach, allows you to avoid needing to download a large file or send it to a server, but it requires that the editing occurs at 1x speed. For example, if you have a 60 minute video and want to trim it to 20 minutes, you'll need to wait 20 minutes for the new blob to be created. With our early prototypes, this same work can be done in less than 3 seconds.
This API is a starting point to enable video editing on the client that not only enables the capabilities listed above without the need to handle all of above overhead for the most common web based video editing scenarios.
We have worked with Flipgrid to validate that this approach tackles their video editing needs and significantly improves their user experience.
We're proposing a MediaBlob
that extends the regular blob and a MediaBlobOperation
which will be used to batch the proposed media editing operations. Based on initial feedback from customers that have a need for this technology, they needed concatenation and trimming capabilities, as such that is what we started with.
[Exposed=(Window,Worker), Serializable]
interface MediaBlob : Blob {
constructor(Blob blob);
readonly attribute long long duration;
};
When the MediaBlob
constructor is invoked, the User Agent MUST run the following steps:
let mediaBlob = new MediaBlob(blob); // blob is a Blob object for a valid media
console.log(mediaBlob.duration) // Outputs 480000 = 8 minutes
[Exposed=(Window,Worker), Serializable]
interface MediaBlobOperation {
constructor(MediaBlob mediaBlob);
void trim(long long startTime, long long endTime);
void split(long long time);
void concat(<Sequence<MediaBlob>);
Promise<Sequence<MediaBlob>> finalize(optional DOMString mimeType);
};
When the MediaBlobOperation
constructor is invoked, the User Agent MUST run the following steps:
The MediaBlobOperation
methods Trim, Concat and Split will not modify the MediaBlob when invoked. These methods will be tracked and executed only when Finalize is called. The benefit of batching these operations is to save memory and provide efficiency. Due to the nature of Split operation, it should always be the last method if called before calling Finalize.
The trim method is utilized to create the segment of time that the author would like to keep; the remaining content on either end, if any, is removed.
startTime
: The starting time position in milliseconds RequiredendTime
: The ending time position in milliseconds RequiredThe User Agent will execute the following when finalize is called.
let mbo = new MediaBlobOperation(new MediaBlob(blob));
mbo.trim(240000, 360000);
mbo.finalize().then(function(mediaBlobs) {
// mediaBlobs[0] will be the trimmed blob of 2 min duration
});
The split method allows the author to split a blob into two separate MediaBlobs at a given time. Due to the nature of this operation, it should be the last operation before calling finalize().
time
: The time, in milliseconds, at which the blob is to be split into two separate MediaBlobs.The User Agent will execute the following when finalize is called.
let mbo = new MediaBlobOperation(new MediaBlob(blob));
mbo.split(2000);
mbo.finalize().then(function(mediaBlobs) {
// mediaBlobs will be an array of two MediaBlobs split at 2 seconds
});
This method allows you to take two MediaBlob blobs and concatenate them.
blob
: This is the MediaBlob to concatenate with the current MediaBlobThe User Agent will execute the following when finalize is called.
let mbo = new MediaBlobOperation(new MediaBlob(blob1));
mbo.concat(new MediaBlob(blob2));
mbo.finalize().then(function(mediaBlobs) {
// mediaBlobs[0] will be a concatenated MediaBlob of blob1 and blob2
});
This method will execute all the tracked operations and return an array of MediaBlob object based on the mimeType value.
mimeType
: DOMString representation of the mimetype [RFC2046] as the expected output// let the mimeType of the blob be 'video/webm; codecs=vp8,opus;'
let mbo = new MediaBlobOperation(new MediaBlob(blob))
mbo.finalize('video/mp4; codecs=h264,aac;').then(function(mediaBlobs) {
// mediaBlobs[0] will be a MediaBlob object encoded with H.264 video codec and AAC audio codec
});
let mbo = new MediaBlobOperation(new MediaBlob(blob));
mbo.trim(4000, 360000);
mbo.concat(new MediaBlob(blob2));
mbo.finalize().then(function(mediaBlobs) {
// mediaBlobs[0] will be a concatenated MediaBlob of blob1 (which will be trimmed) and blob2
});
When finalize() is called, the User Agent will perform these basic checks for the operations that are batched. This error checking should be done before executing any of the operations.
For trim()
For split()
For concat()
The DOMException.message must contain:
Example:
let mbo = new MediaBlobOperation(new MediaBlob(blob));
mbo.trim(0,5000); // Trim from 0 to 5 secs
mbo.split(7000); // Split the MediaBlob at 7 secs
mbo.finalize().then(function(mediaBlobs) { })
.catch((error) => {
// sample error.message: "Split called on sequence 2: The time provided is greater than the duration of the MediaBlob."
});
The Finalize method can take a DOMString of the mime-type the author desires to have returned from the method. To determine if the mime-type is supported, do the following:
mimeType specifies the media type and container format for the recording via a type/subtype combination, with the codecs and/or profiles parameters [RFC6381] specified where ambiguity might arise. Individual codecs might have further optional specific parameters.