2023-10-26
Spec: https://wicg.github.io/document-picture-in-picture/
There currently exists a Web API for putting an HTMLVideoElement
into a
Picture-in-Picture window (HTMLVideoElement.requestPictureInPicture()
). This
limits a website's ability to provide a custom picture-in-picture experience
(PiP). We want to expand upon that functionality by giving websites the ability
to open a picture-in-picture (i.e., always-on-top) window with a blank document
that can be populated with arbitrary HTMLElement
s instead of only a single
HTMLVideoElement
.
This new window will be much like a blank same-origin window opened via the
existing window.open()
API, with some differences:
HTMLVideoElement.requestPictureInPicture()
API.window.history
or window.location
calls that change to a new document will close the PiP window).HTMLVideoElement.requestPictureInPicture()
API.HTMLElements
in an always-on-top
window.requestPictureInPicture()
on any element would be
the simplest way, for reasons described below, this isn't feasible.While the existing Picture-in-Picture API for HTMLVideoElement allows a website to provide a Picture-in-Picture video experience, it is very limited in what inputs the window can take and the look-and-feel of those inputs. With a full Document in Picture-in-Picture, the website can provide custom controls and inputs (e.g. captions, playlists, time scrubber, liking/disliking videos, etc) to improve the user's PiP video experience.
It is common for users to leave the tab during a video conferencing session for various reasons (e.g. presenting another tab to the call or multitasking) while still wishing to see the call, so it's a prime use case for Picture-in-Picture. As above, the current experience a video conferencing website can provide via the HTMLVideoElement PiP API is limited in style and input. With a full Document in Picture-in-Picture, the website can easily combine multiple video streams into a single PiP window and provide custom controls like sending a message, muting another user, raising a hand, etc.
The Pomodoro technique is a time management method that uses a kitchen timer to break work into intervals, typically 25 minutes in length, separated by short breaks. Pomodoro timer apps on desktop and mobile can use the PiP feature to display the current timer permanently on the screen as a floating timer for timed focus management while sat at a desk or while on the go.
<body>
<div id="player-container">
<div id="player">
<video id="video" src="https://github.com/WICG/document-picture-in-picture/raw/main/foo.webm"></video>
<!-- More player elements here. -->
</div>
</div>
<input type="button" onclick="enterPiP();" value="Enter PiP" />
</body>
// Handle to the picture-in-picture window.
let pipWindow = null;
async function enterPiP() {
const player = document.querySelector("#player");
const pipOptions = {
width: player.clientWidth,
height: player.clientHeight,
};
pipWindow = await documentPictureInPicture.requestWindow(pipOptions);
// Style remaining container to imply the player is in PiP.
const playerContainer = document.querySelector("#player-container");
playerContainer.classList.add("pip-mode");
// Add player to the PiP window.
pipWindow.document.body.append(player);
// Listen for the PiP closing event to put the video back.
pipWindow.addEventListener("pagehide", onLeavePiP.bind(pipWindow), {
once: true,
});
}
// Called when the PiP window has closed.
function onLeavePiP() {
if (this !== pipWindow) {
return;
}
// Remove PiP styling from the container.
const playerContainer = document.querySelector("#player-container");
playerContainer.classList.remove("pip-mode");
// Add the player back to the main window.
const pipPlayer = pipWindow.document.querySelector("#player");
playerContainer.append(pipPlayer);
pipWindow = null;
}
const pipVideo = pipWindow.document.querySelector("#video");
pipVideo.loop = true;
As part of creating an improved picture-in-picture experience, websites will often want customize buttons and controls that need to respond to user input events such as clicks.
const pipVideo = pipWindow.document.querySelector("#video");
const pipMuteButton = pipWindow.document.createElement("button");
pipMuteButton.textContent = "Toggle mute";
pipMuteButton.addEventListener("click", () => {
pipVideo.muted = !pipVideo.muted;
});
pipWindow.document.body.append(pipMuteButton);
The website may decide to close the DocumentPictureInPicture
window without
the user explicitly clicking on the window's close button. They can do this by
using the close()
method on the Window
object:
// This will close the PiP window and trigger our existing onLeavePiP()
// listener.
pipWindow.close();
When the PiP window is closed for any reason (either because the website
initiated it or the user closed it), the website will often want to get the
elements back out of the PiP window. The website can perform this in an event
handler for the pagehide
event on the Window
object. This is shown in the
onLeavePiP()
handler in Example code section above and is
copied below:
// Called when the PiP window has closed.
function onLeavePiP() {
if (this !== pipWindow) {
return;
}
// Remove PiP styling from the container.
const playerContainer = document.querySelector("#player-container");
playerContainer.classList.remove("pip-mode");
// Add the player back to the main window.
const pipPlayer = pipWindow.document.querySelector("#player");
playerContainer.append(pipPlayer);
pipWindow = null;
}
The document picture-in-picture window supports the resizeTo() and resizeBy() APIs, but only with a user gesture on the PiP window:
const expandButton = pipWindow.document.createElement('button');
expandButton.textContent = 'Expand PiP Window';
expandButton.addEventListener('click', () => {
// Expand the PiP window's width by 20px and height by 30px.
pipWindow.resizeBy(20, 30);
});
pipWindow.document.body.append(expandButton);
HTMLVideoElement.requestPictureInPicture()
idea to allow it to be called on any HTMLElement
?Any API where the UA is taking elements out of the page and then reinserting them ends up with tricky questions on what to show in the current document when those elements are gone (do elements shift around? Is there a placeholder? What magic needs to happen when things resize? etc). By leaving it up to websites to move their own elements, the API contract between the UA and website is much clearer and simpler to understand.
window.open()
, why not just add an alwaysOnTop
flag to window.open()
?The main reason we decided to have a completely separate API is to make it
easier for websites to detect it (since in most cases, falling back to a
standard window would be undesirable and websites would rather use
HTMLVideoElement
PiP instead). Additionally, it also works differently enough
from window.open()
(e.g., never outliving the opener) that having it separate
makes sense.
Giving websites less control over the size/position of the window will help
prevent, e.g., phishing attacks where a website pops a small always-on-top
window over an input
element to steal your password.
Surface Element was a proposal where the website would wrap PiP-able content in advance with a new type of iframe-like element that could be pulled out into a separate window when requested. This had some downsides including always requiring the overhead of a separate document (even in the most common case of never entering picture-in-picture).
We also considered a similar approach to the one in this document, but with no
input allowed in the DOM (only allowlisted controls from a predetermined list in
a similar fashion to the existing HTMLVideoElement
PiP). One issue with this
approach is that it really didn't help websites do much more than they already
can today, since a website can draw anything in a canvas element and PiP a video
with the canvas as a source. Having HTMLElements
that can actually be
interacted with is what makes the Document Picture-in-Picture feature worth
implementing.
Many thanks to Frank Liberato, Mark Foltz, Klaus Weidner, François Beaufort, Charlie Reis, Joe DeBlasio, Domenic Denicola, and Yiren Wang for their comments and contributions to this document and to the discussions that have informed it. Special thanks to Mikaela Watson and Glen Anderson for the initial wireframes.