flatpak / xdg-desktop-portal

Desktop integration portal
https://flatpak.github.io/xdg-desktop-portal/
GNU Lesser General Public License v2.1
547 stars 183 forks source link

[Feature Request] Screencast: Allow some way to request windows by name or process #1064

Open zorbathut opened 11 months ago

zorbathut commented 11 months ago

I'm working on a system that requires the ability for OBS to, without user intervention, capture the contents of a new window. At the moment xdg-desktop-portal cannot support this; it allows selecting a window with user intervention, or it allows selecting a window with an opaque recovery token. But the recovery token is generated only with user intervention, so there's essentially no solution to allow screencasting a window without going through the window picker.

This is something that you can do on Windows because OBS identifies windows by name, and it's sometimes really convenient because it dramatically reduces the amount of user intervention needed when you do things.

I'd like an option that can be passed in which is a desired window identifier, either identifying via the name of the window or via the name of the process. Using the game Hades as an example because I have it convenient, this would look something like, as an added parameter to org.freedesktop.portal.ScreenCast's SelectSources:

request_by_name Hades

or

request_by_process Z:\home\zorba\.local\share\Steam\steamapps\common\Hades\x64Vk\Hades.exe

If the request fails, it could either fall back on the picker, or just return failure. I'm not sure which of those is better.


Concerns:

There are definite potential security issues in this. I'm not proposing that you should be able to get any window just by guessing the window name or process name; instead, I'd personally expect a checkbox on the picker labeled something like "allow future capturing of windows with this name", which saves that flag permanently somewhere. This means you still need to manually intervene once, but after that, it can just happen transparently for you.

I'm kinda handwaving on "saves that flag permanently somewhere". I think if this were to be done completely right, this would also need a dialog somewhere so you could manage the authorized names/processes and revoke them. This may end up being complicated. A hacky initial option could forego the window checkbox and just allow a hand-authored config file somewhere; this works for my purposes and might work for ironing out interface problems.

There's potential ambiguity if there's more than one window that matches the pattern. In my case, I don't care! Just pick one! Maybe other people care.


In my case, I have control over the window, so I could send a D-Bus message that says "identify this window with this given tag", then we could have request_by_tag. I don't think this is a good solution, though, because most people who want this feature are not going to have code-level control over the window.

I'd love to get this up to feature-parity with Windows; right now it errs on the side of security, which is a good direction to err in, but, man, sometimes convenience and automation are really nice!

orowith2os commented 11 months ago

This feels like a very big security issue, as you mentioned, as well as usability issue. You can already continue the capture of a previously selected window if the user allows it - what more do you need? You can also use things like ObsVkCapture for games, which does work with Flatpak.

Not to mention that requesting by name or process is iffy - the window names are subject to change, and the process can be different depending on the environment you're run in; the process in a flatpak isn't necessarily the same as the process running on the host, as well as file path issues.

You can also probably figure something out with window handles, and the same mechanism that ObsVkCapture uses.

jadahl commented 11 months ago

Would it help if you could pass a window title/app-id to as a filter, then still require the user to click "Share"? It'd simplify the user interaction by potentially having a single window to choose.

zorbathut commented 11 months ago

This feels like a very big security issue, as you mentioned, as well as usability issue.

Note that I'm not asking for this to be enabled by default, I just want another checkbox to loosen the security a bit further. This is one of those cases where security and usability clash a bit.

This feels like a very big security issue, as you mentioned, as well as usability issue. You can already continue the capture of a previously selected window if the user allows it - what more do you need?

The problem is that it's a new window spawned by a new process (with the same name, and with the same process name, but still a new PID.) This makes it impossible to "continue" the capture; it needs to be a new capture of a new window that has many of the same properties as the last window.

You can also use things like ObsVkCapture for games, which does work with Flatpak.

This might work; I'll check it out.

Not to mention that requesting by name or process is iffy - the window names are subject to change, and the process can be different depending on the environment you're run in; the process in a flatpak isn't necessarily the same as the process running on the host, as well as file path issues.

In my case, I have control over the window name, and it's not running in a flatpak anyway. I agree this might be something that needs to be tackled for general purposes though.

(Although I will note that "capture based on window name" has been an identifier used in OBS for quite a while.)

zorbathut commented 11 months ago

Would it help if you could pass a window title/app-id to as a filter, then still require the user to click "Share"?

Unfortunately not. Needs to be fully without interaction.

zorbathut commented 11 months ago

In response to the confused emoji:

The thing I'm working on is an automated test framework for a game. I need to be able to spawn new fresh instances of the game running test scripts while automatically recording footage. "Automatically" is the entire point here; needing a human to sit there clicking the "share" button every fifteen seconds is unacceptable.

Right now, I'm solving this by running under X11 with XCompositor capturing. I tried switching to Wayland, but as near as I can tell all captures in Wayland must go through xdg-desktop-portal. There's no way to tell xdg-desktop-portal "no, seriously, let me capture this window without user intervention", so this makes Wayland completely unusable (regardless of whether a flatpak is involved, for the record.) I'd like to head this problem off sooner rather than later.

If I do need to switch to Wayland, my current solution is going to be a custom build of xdg-desktop-portal that implements "search by window name" and a matching custom build of OBS to pass chosen window names in, because I frankly don't care about the security implications in this context. But it'd be nice to come up with a solution that other people can use as well :)

orowith2os commented 11 months ago

Then your best bet will be OBS-VkCapture. Unless there's a more real-world use case for screen capturing via the ScreenCast API with specific window names/etc, it's not really useful. Test frameworks don't usually need to, nor do they normally go through, desktop APIs like this.

Mikenux commented 11 months ago

There is request #304, where the window name is needed (for display within app). If such a portal existed, then what is requested here would be an extension of it by asking to monitor a specific name or selecting a specific application. Am I right?

orowith2os commented 11 months ago

That sounds about right.

ruineka commented 1 month ago

This issue requires careful consideration as it poses a significant challenge for serious streamers transitioning from Windows to Linux. There seems to be a misunderstanding regarding the complexity of streamer setups, where multiple sources need to be added as overlays to create engaging content. While selecting sources for a camera and a game is straightforward, the process becomes frustrating beyond that.

Modern streamers incorporate 3D avatars, multiple webcams capturing different angles of their keyboard, avatar, and face, along with various web browser plugins as additional sources. Currently, upon opening OBS, users are inundated with countless requests to select window sources, causing confusion about which window to choose. This cumbersome process creates significant barriers for streamers considering Linux as a viable platform.

bbb651 commented 4 weeks ago

Currently, upon opening OBS, users are inundated with countless requests to select window sources, causing confusion about which window to choose. This cumbersome process creates significant barriers for streamers considering Linux as a viable platform.

Restoration is already a thing, and is implemented on obs (you need your portal backend to support ScreenCast v4):

restore_token (s)

The token to restore a previous session.

If the stored session cannot be restored, this value is ignored and the user will be prompted normally. This may happen when, for example, the session contains a monitor or a window that is not available anymore, or when the stored permissions are withdrawn.

The restore token is invalidated after using it once. To restore the same session again, use the new restore token sent in response to starting this session.

Setting a restore_token is only allowed for screen cast sessions. Persistent remote desktop screen cast sessions can only be handled via the Remote Desktop interface.

This option was added in version 4 of this interface.

Although it seems like it doesn't work across captured applications restarts and compositor restarts, I think it's already tracked in #1355

Mikenux commented 4 weeks ago

@ruineka: Maybe open a discussion (https://github.com/flatpak/xdg-desktop-portal/discussions/new/choose) to document what streamers exactly need and expect?