CapSoftware / Cap

Open source Loom alternative. Effortless, instant screen sharing.
https://cap.so
GNU Affero General Public License v3.0
3.69k stars 173 forks source link

Create local video rendering pipeline #77

Closed richiemcilroy closed 1 week ago

richiemcilroy commented 3 weeks ago

We're iterating fast on a new local-first Cap app powered by Tauri V2 and SolidJS.

As part of the new app, both the screen recording and webcam recording are captured separately.

In the new Cap app, when you finish recording your video, you are presented with 4 options:

For this to work, we require a new local video rendering pipeline under the hood.

The video rendering pipeline will probably be a function which can be invoked from anywhere in the app. The params passed to it would be:

-screen recording file path -webcam recording filepath -video settings: -- webcam size -- webcam position, -- webcam style (e.g. border radius, shadow), -- video output size -- video background / gradient -- video padding size

The video rendering function should be performant, and return the file path of the rendered video. It should output a regular mp4 video.

Will add an extra $250 to the bounty if you can build this in Rust. If not, JavaScript is fine using Web API's.

The new desktop app is currently located in the brendonovich/new-desktop-app branch, in folder desktop-solid. You can run the desktop-solid app directly with nvm use 20, pnpm install, and then pnpm tauri dev.

Feel free to DM me on X or Discord if you need more info!

richiemcilroy commented 3 weeks ago

/bounty $750

algora-pbc[bot] commented 3 weeks ago

💎 $750 bounty • Cap

Steps to solve:

  1. Start working: Comment /attempt #77 with your implementation plan
  2. Submit work: Create a pull request including /claim #77 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to CapSoftware/Cap!

Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🟢 @varshith257 Aug 17, 2024, 2:35:39 PM WIP
🟢 @sivaprasad2000 Aug 17, 2024, 8:36:27 PM WIP
varshith257 commented 3 weeks ago

I will go with rust and with this /attempt #77

Algora profile Completed bounties Tech Active attempts Options
@varshith257 8 bounties from 4 projects
TypeScript, Go
﹟73
Cancel attempt
richiemcilroy commented 3 weeks ago

nice @varshith257 - need any help from me? Feel free to DM me on Discord and we can talk through it

varshith257 commented 3 weeks ago

Sure! Started on it and have a good approach in mind. If I struck anywhere would definitely reach you

richiemcilroy commented 3 weeks ago

Ok cool - what you thinking for using Rust? Any crates that can help, or will you be working directly with low level graphics API's etc?

varshith257 commented 3 weeks ago

I am thinking of using ffmpeg crate

richiemcilroy commented 3 weeks ago

Will be interested to see how you standardise the pipeline for this. We use ffmpeg across Cap for a bunch of stuff (as you'll see in the code).

Brendonovich commented 3 weeks ago

For some clarification, this system will need to be able to composite multiple videos on top of one another, animate each video's transform, apply post processing effects like shadows and (maybe) shaders, and potentially render text. I'm not sure if ffmpeg is capable of this in a reasonably customisable way

sivaprasad2000 commented 3 weeks ago

Doesn't Gstreamer let you do this much easily?

/attempt #77

ItsEeleeya commented 3 weeks ago

Doesn't Gstreamer let you do this much easily?

/attempt #77

Options

I'm surprised I hadn't heard of that until now! There seems to be a rust crate with bindings available. But the installation seems a bit involved compared to the ffmpeg sidecar with no installation.

ItsEeleeya commented 3 weeks ago

For some clarification, this system will need to be able to composite multiple videos on top of one another, animate each video's transform, apply post processing effects like shadows and (maybe) shaders, and potentially render text. I'm not sure if ffmpeg is capable of this in a reasonably customisable way

Could you please expand on the "… animate each video's transform" part? I'd like to give this an attempt and from what I understood, we need to put the webcam overlay on top of the recording with the provided styles, then if present, put a padding on the video and fill the area around with a background image or video of a custom gradient. Is this correct?

Brendonovich commented 3 weeks ago

Could you please expand on the "… animate each video's transform" part?

We'll need to be able to zoom and pan the screen capture and possibly the webcam, which would require animating scale and translate. What you described sounds correct, we can add extra flexibility later on if necessary.

ItsEeleeya commented 3 weeks ago

after looking around a bunch, gstreamer seems like the way to go. Especially for making it easier to implement future features using their plugins which sounds really exciting. The only potential downside I see is the final bundle size which could get near 200MB I think?

Brendonovich commented 3 weeks ago

gstreamer looks interesting, especially GES. the gstreamer binaries could be download after installation to avoid inflating bundle size

richiemcilroy commented 3 weeks ago

how about using something like wgpu?

we use ffmpeg or gstreamer to decode the video files into frames, then we use wgpu to create a rendering context which would allow for things like transformations, textures, etc.

been working on a proof of concept... will post some processing time benchmarks later tonight hopefully.

varshith257 commented 3 weeks ago

Yes it will be better @richiemcilroy

Brendonovich commented 3 weeks ago

if we roll our own gpu rendering then i think wgpu would be a smart choice, we'll just have more to build more from the ground up

sivaprasad2000 commented 3 weeks ago

@Brendonovich , could you please confirm the list of video transforms we are planning to roll out in the first stage?

ItsEeleeya commented 3 weeks ago

I looked into wgpu as well and it'd be so much lighter if there's a decision to not use gstreamer (for now). But since this is needed kind of ASAP I think gstreamer might still be the way to go?

I already did most of the work for gstreamer. wgpu sounds kinda fun so I might try it out as well (I've only edited one shader file in my whol life)

I also think both approaches should be fairly straightforward to create a live preview for on the frontend so you can see the end results before rendering

richiemcilroy commented 3 weeks ago

Here's an initial design of the editor we're implementing:

image

Transforms we'd be looking to implement include: -Video background (gradient, image, colour) -Inset/padding -Webcam position -Webcam styling (rounded corners, shadow, size) -Zoom in transition on screen recording (with motion blur). -Custom cursor movement (from separately captured cursor data)

I think that's most of it, unless you can think of anything else @Brendonovich

So the render function you'd be creating would take in these input settings, as well as the file path to both the screen recording and webcam.

On completion, it should return the path to the rendered video I'd imagine.

By default, we'll pass in generic settings if no custom settings are provided (e.g. webcam bottom left of screen, rounded corners, shadow) etc.

ItsEeleeya commented 3 weeks ago

Are these the new requirements for this issue/bounty now?

Also is that part of the new design or am I dreaming?!

The motion blur and custom cursor remind me of Screen Studio. Cap might just become the only comparable app to that on Windows and Linux (unless there are some that I don't know of)

richiemcilroy commented 3 weeks ago

So the requirements can stay the same for now, as we won't be adding zoom in transitions / custom cursor at the very start - the initial requirements are for a faster PoC.

Just need to be able to build this in a way where it's adaptable for this soon.

varshith257 commented 3 weeks ago

@ItsEeleeya Have any progress on this work? I have too started just when I commented but I have started with ffmpeg.

ItsEeleeya commented 3 weeks ago

@ItsEeleeya Have any progress on this work? I have too started just when I commented but I have started with ffmpeg.

Currently I'm trying to get the app running with gstreamer, but their homebrew installation has been broken, manual installation seems to work but still some issues running it 🥲

This was my last attempt trying it with ffmpeg: Screenshot-ffmpeg test

ItsEeleeya commented 3 weeks ago

It took a while but I got the app compiling on (only) macOS for now if gstreamer is installed ( I will write some instructions once I have the draft ready! )

richiemcilroy commented 3 weeks ago

There's a WIP on rendering-engine-test branch with WGPU

ItsEeleeya commented 3 weeks ago

There's a WIP on rendering-engine-test branch with WGPU

That's awesome did you just learn WGPU in that short of a time?

gstreamer took way too much of my time and I ended up giving up on it and looked into wgpu too but ultimately came across etro (node + node-canvas) and I was looking into porting it over for use with Tauri (Without a node runtime). If it seems interesting now or there's a need for it in the future I could port it over