aws / amazon-chime-sdk-ios

An iOS client library for integrating multi-party communications powered by the Amazon Chime service.
https://aws.amazon.com/chime/chime-sdk/
Apache License 2.0
139 stars 64 forks source link

Picture in Picture feature #654

Open haroldh17 opened 4 months ago

haroldh17 commented 4 months ago

Describe the bug I'm struggling to enable Picture-in-Picture mode for videos when the user navigates away from my application. Is this functionality feasible, and if so, how might it be implemented?

To Reproduce Steps to reproduce the behavior:

  1. Start a session
  2. navigate outside of the app

Expected behavior A minimized Video of the callee should show in a corner, if the user navigates away from the app to another. And then it should be clickable to open back up to the app. This feature is on iOS FaceTime and WhatsApp: Multitask with Picture in Picture on iPhone

Please help.

dineshiOSDev commented 3 months ago

PIP is not supported yet. I was facing the same and Chime not supporting as of now is what their support team replied. Instead try adding floating view so that we can use pip like effect inside the application.

hvsw commented 1 month ago

You can implement PiP with AVPictureInPictureController, but so far I found a bunch of rough edges.


Using AVPictureInPictureVideoCallViewController as described in Adopting Picture in Picture for video calls works, however

The PiP window doesn’t receive touch events when you use AVPictureInPictureVideoCallViewController, so you can’t customize the window’s user interface by adding buttons.

so you have no customization options like having a close button to leave the call.


You can instead use AVPictureInPictureController.ContentSource(sampleBufferDisplayLayer:playbackDelegate:). See UIPiPView for some tips on how to make a sample buffer. CPU usage is 85%(out of 600%) fairly high on iPhone 15 Pro using this sampling method - I think creating a view class conforming to VideoRenderView and calling AmazonChimeSDK.AudioVideoFacade.bindVideoView(videoView:tileId) directly to it instead of sampling should improve performance but I was not able to test this (for some reason PiP is not starting).

Consider a call with multiple video tiles and you want to show just the active speaker in your PiP window. Considering you're using a collection view to show a grid of videos, sampling a single cell is complex to manage because of cell reuse. The easy way here is to render the entire collection view configured as you want, but as I mentioned the CPU usage is too high.

334215046-163db267-8842-4732-b345-827acadff877

What do other competitors do?

Other video service providers offer direct integration and helper classes so I'll share them here for reference:

Twilio provides a VideoTrackStoringSampleBufferVideoView to simplify PiP implementation: https://www.twilio.com/docs/video/changelog-twilio-video-ios-latest#580-february-28-2024

100ms provides guidance and an example project with a sample buffer view: https://www.100ms.live/docs/ios/v2/how-to-guides/set-up-video-conferencing/render-video/pip-mode https://github.com/100mslive/100ms-ios-sdk/tree/main/Example

How to display participant's video in PiP

AVPictureInPictureController requires source content to use AVSampleBufferDisplayLayer on it's subview. You need to use HMSSampleBufferDisplayView instead. HMSSampleBufferDisplayView is an UIImageView that uses AVSampleBufferDisplayLayer for drawing.

Zoom provides seamless integration: https://developers.zoom.us/blog/meeting-sdk-ios-picture-in-picture/