flatpak / xdg-desktop-portal-gtk

Gtk implementation of xdg-desktop-portal
GNU Lesser General Public License v2.1
129 stars 100 forks source link

Dialog hangs when multiple portal requests are active #334

Open kkartaltepe opened 3 years ago

kkartaltepe commented 3 years ago

It seems that the dialog prompting users for selecting a desktop or a window to share may block other dialog processes such that if the user has multiple prompts at once there can be as little as 1 dialog that will accept user input.

In practice it appears to a user that the portal is frozen and the dialog cannot be closed (as the actual active window may be beneath the frozen window) or moved to reveal the live window.

This can be replicated by starting an application that quickly makes multiple portal screen and window requests at once. I happened to be using obs with a couple screen and window captures configured. When you restart the application all configured captures will access the portal opening (5 in my case) portal requests. Most of the time the top request was unresponsive and could not be moved to reveal the responsive prompt (alt tabbing through them all eventually one responded). Though I compiled some PRs to to use this natively I suspect the flatpak version of obs which includes portal support should be sufficient to replicate this without compiling a lot of code.

Tested on Fedora 33, xdg-desktop-portal-gtk-1.8.0-1.fc33.src.rpm, default gnome wayland session.

jadahl commented 3 years ago

Sorry for the delay, I had missed this report, and got poked on IRC about it.

So there seems to be two issues:

A usability issue, where restoring what was previously shared is cumbersome, and what seems like something messing up input when a "flood" of dialogs show up. The second one I would just say is a bug, which should of course be fixed, maybe it can be reproduced easily.

The usability is a different thing. There are two things that can be improved I think:

1) We can make it possible to "append" new sources (thus open new streams) to an existing screen cast "session"

In itself, this isn't particularly interesting but ...

2) We can make it possible to let the user optionally "remember" what is shared

The second time the application starts, if it wants to "restore" itself including regain access to what was shared, it could pass some token or ID it got from the old session, enabling the screen share dialog to have a "pre-filled" selection of what is applicable (same windows unless they are gone, same monitors, and so on).

kkartaltepe commented 3 years ago

Sorry are you talking about the linked issue when referring to usability problems? If you want to talk about that id love to discuss it on that obs-studio issue or whatever platform flatpak has to discussing such changes (it seems a bit off topic for this bug report, but we can discuss it here too).

For this specific issue it seems like you agree this inability to interact with the portal UI is a bug? Or are you considering something else to be the bug?

jadahl commented 3 years ago

I mean that the way the screen cast portal works now can cause usability problems, as in the case you describe (restore a complex setup) means the user have to manually select all the things they want to share again.

The two things I described would be a way to mitigate this issue, by making the portal more forgiving when it comes to restoring these more complex setups. It would need work on the portal side of things first, then having OBS use the new functionality.

But first it would be good to know whether this would be a good option to begin with.

Right now, it should be possible for OBS to only launch two dialogs when restoring, where the user again selects all the windows that should be shared at once, then a second dialog selecting all the monitors at once, so the future improvement I'm talking about here that would need portal features is to be able to have a way for OBS to ask the portal to "pre-fill" those selections in a single dialog.

You can see how it looks, selecting more sources at once, by changing the False to True, on the line with multiple, here: https://gitlab.gnome.org/-/snippets/19#LC106

jadahl commented 3 years ago

If the portal dialogs freeze, that is likely a portal bug, if not a compositor bug, but I suspect the problem is with the portal setting multiple dialogs the "transient" for the same parent, accidentally messing up the stacking order if its own set of dialogs. I will try to reproduce that issue.

kkartaltepe commented 3 years ago

Sure if you want to discuss the api changes here we can do that too.

it should be possible for OBS to only launch two dialogs when restoring, where the user again selects all the windows that should be shared at once, then a second dialog selecting all the monitors at once

Maybe some background will help me explain my confusion with this suggestion. OBS users are accustomed to having sources which correspond to specific windows or displays (screens). These sources are arranged in their scenes. So the mapping between a OBS source and the window being captured is important, if we swapped one window for the other the users scenes would be wrong.

So the suggestion has the issue of how to map the obs source to the window the user desires (the pipewire stream provided). Right now we open one session (one stream) per source so its obvious, if we open 20 pipewire streams there is no metadata about what windows are what streams so mapping the streams (as I understand the api) cannot be done. Moreover what happens when plugins in their external monitor or removes their external monitor (or closes or opens windows), this seems hard to deal with with the suggested method of trying to open all possible streams at once.

kkartaltepe commented 3 years ago

Perhaps some motivating examples of user behavior, here is a common simple scene setup (pulled from a real user's configuration today)

21:44:48.594: - scene 'Viva Pinata':
21:44:48.594:     - source: 'Viva Pinata Game' (window_capture)
21:44:48.594:     - source: 'Bongo Cat' (window_capture)
...
21:44:48.594: - scene 'OMORI':
21:44:48.594:     - source: 'OMORI Game' (window_capture)
21:44:48.594:     - source: 'Bongo Cat' (window_capture)
...
21:44:48.594: - scene 'Stardew Valley':
21:44:48.594:     - source: 'Stardew Valley Game' (window_capture)
21:44:48.594:     - source: 'Bongo Cat' (window_capture)
...

The user has various scenes created and commonly a user will setup a scene per topic (games are popular) which will only display that window. These captures are also handled automatically by OBS, so when the user starts the game obs will detect that a window matching the desired application is available and start capture (commonly using window class or executable to determine a matching window).

Some users dont want this auto capturing and will instead configure window captures that only capture a single window without any matching (so only an exact window id).

Another common setup is for the user to bind a hotkey in obs to capture the currently active/fullscreen window to their desire.

The current API kind of supports 2, where a user doesnt want any failover and is already resigned to interacting with OBS. We could work around the existing flatpak issue by simply not trying to start a session until the user tried to reconfigure their source. But this is also the least common setup because it requires the user to reconfigure their scenes every time they have a new window to capture.

1 and 3 are the most common. A user configures their scenes once and then as they change between various windows and/or use their hotkeys to target their current window everything works and obs can be kept on another monitor to preview but otherwise requires no interaction during a stream. These are the workflows that would be most important to support and require changes to the api.

For screen capture, supporting hotplug would be a nice to have but just maintaining consistent source to screen mapping is acceptable and not easy to do with the current api.

kkartaltepe commented 3 years ago

And for me a place i see the multi capture being helpful, as is, is for multi window applications. These are very troublesome with the existing window capture style on windows/mac/x11. Users using tools like Adobe/Blender/Gimp/ even Powerpoint that create multiple windows often have to choose a single window (the canvas typically). Multi capture in the existing API is fine for these as all the streams are a single program and it gives positioning information so OBS can compose them into a single source.

But using it for multiple applications seems more difficult. And it still has the problem of windows being added/removed (if a user pops a dockable window out, how will that be handled?)

jadahl commented 3 years ago

Hot keys is out of scope for the screen cast API itself, but could possibly be done using another portal or something similar, so lets leave that part out for now.

First, portals are here to allow the user to be in control of what applications can get hold of, so that should always be kept in mind, but maybe there are some harmless things we can do to be more helpful.

Some random ideas; I don't know if any are feasible, but just writing them down anyway:

Maybe we can pass along the app-id and window title along with the stream, so the application can associate a stream with something it can remember. Window titles will always be hit-and-miss, as they change, but app-ids are more predictable, though non-precise for multi window applications. Maybe we can have some "uuid" like things too, but those would not be reproduceable would the to be shared application be restarted. Maybe something similar can be done with monitor streams, e.g. pass an opaque identifier (i.e. something portal implementation specific e.g. some hash of the vendor/product/serial triplet).

When selecting sources could then provide "hints" of previously known sources, that can then be "pre-filled" by the portal UI. I.e. OBS would send the hints 'Viva Pinata Game', 'Bongo Cat', (or whatever metadata it'd actually be), and if any of those are available, it'd be "pre-selected". Or it could send the hints 'Viva Pinata Game', 'OMRI game', 'Stardew Valley Game', 'Bongo Cat', and it'll pre-select whatever is available. Or hints could be a list of "sets" maybe, I dunno, depends on how complex we want to be.

Automatic detection without user interaction is more problematic; with sandboxing and Wayland, we don't want applications to be able to introspect the environment for privacy concerns, so one would still need to click some button to activate the scene, with the accompanied user interaction, granting the sharing of the privileged content.

kkartaltepe commented 3 years ago

Personally I think we also need hints in the reverse direction, users need to know what they are selecting for. The prompts currently rely entirely on user hopefully not have multiple prompts pop up at once. Being able to pass a source name or something from the application to be shown in the selector would be helpful.

I think the dynamic nature of these common use cases depends on the software getting out of your way not getting in the way. It seems backwards to move towards what supposed leaders in security and privacy configuration (ios, macos, android) already moved away from (im not prompted every time my camera app wants to capture my camera). These systems typically allow you to grant persistent privileges for precisely this reason. This is of course my outsiders opinion im not at all well read on flatpak and your design goals may have some other kind of permission future in mind. Personally the fact that Gnome Screen Recorder can do this without ever prompting seems like enough incentive to provide equal usability, perhaps that this is a flatpak API and not a wayland protocol is at odds with this.

jadahl commented 3 years ago

FWIW, cameras, etc, can be remembered, and are stored as part of the permission store, thus a camera app will not necessarily be nagged about permission the second time, but screen sharing is a bit different; especially when it comes to window sharing, as there is no way to know whether a window from yesterday is the same as a window from today. With the same reasoning, monitor sharing is slightly more possible to remember in the permission store, given the vendor/product/serial triplet, would such be available. Whether that is really desired to share those second time without interaction is another discussion.

The over all goal is that the user should always be in control, and never share something without the intent of doing so.

About the reverse direction, you mean a piece of text to integrate into the dialog about what is expected to be shared, or how it's used or something like that?

kkartaltepe commented 3 years ago

About the reverse direction, you mean a piece of text to integrate into the dialog about what is expected to be shared, or how it's used or something like that?

Right, something like the obs source name for example. Or for browsers they may want to provide the domain requesting access, etc.

jadahl commented 3 years ago

Right, something like the obs source name for example. Or for browsers they may want to provide the domain requesting access, etc.

That's a good point.