flatpak / xdg-desktop-portal

Desktop integration portal
https://flatpak.github.io/xdg-desktop-portal/
GNU Lesser General Public License v2.1
544 stars 183 forks source link

Add a portal to see currently open windows #304

Open johan-bjareholt opened 5 years ago

johan-bjareholt commented 5 years ago

I am a maintainer of the ActivityWatch project and we are polling the currently focused windows name and title. This is not an issue under Xorg but on Wayland this is a problem as there is no common API between compositors. I have discussed this shortly with both a wlroots and Gnome developer and they both seem to agree that exposing such data would be best solved by adding a xdg-desktop-portal API for this.

Wlroots and KWin already have APIs for this (gnome-shell too but it's disabled by default) but they are all different, so a xdg-desktop-portal API would significantly simplify things.

Suggestions on properties for windows, methods and signals that would be good to have:

Window properties:

Signals:

Methods:

Links to prior discussions with Gnome and wlroots developers:

TingPing commented 5 years ago

It sounds very niche. What other applications would use this portal?

I know Discord is a common complaint that it can't see other apps, but since its proprietary somebody would have to convince them to use this anyway.

johan-bjareholt commented 5 years ago

Discord is another example, but another use-case would be for panels which can support multiple compositors (maybe not a good use-case for flatpak, but a use-case non the less). There are also competitors to ActivityWatch which are not open source (most notably RescueTime) which would have the same use-case as us.

I guess it's kind of niche, but not extremely niche. Sooner or later some API like this would be needed for wayland and to me it seems like xdg-desktop-portal is the best fit.

matthiasclasen commented 5 years ago

This is not the kind of information we would normally want to leak into sandboxes, I think.

johan-bjareholt commented 5 years ago

@matthiasclasen Neither should screenshots by default as that contains the same information and more, but you need to explicitly allow permissions for that right?

matthiasclasen commented 5 years ago

A screenshot is an explicit operation. You take a screenshot of an individual window, present the result to the user, and ask him: "share this with "SpyApp deluxe" ?"

What you are asking for is hard to present to the user in a meaningful form.

johan-bjareholt commented 5 years ago

Why wouldn't it be possible to do the same with this? Show a prompt containin "SpyApp is requesting access for information about all your running applications" and give the user the ability to allow or deny the request. By default this request should probably only be valid for the current session but it would also be nice with a checkbox to remember this for future sessions as well.

This is how it is handled in macOS and I wish it was the same on Linux. However, on macOS if you allow it once it will allow it until you go into the settings and revoke the permissions, only allowing it for the current session by default seems more sane IMO. But that's just a small detail.

TingPing commented 5 years ago

It can certainly be done. A goal of portals has been not to pester users with yes/no questions like that though but rather be tied to specific actions that make logical sense to cancel like choosing a file.

amosbatto commented 5 years ago

With all the recent press about smartphone addiction and the academic studies on how too much screen time is impacting social relations and mental health, there is a growing need for people to be able to track their daily activity on mobile devices and limit their screen time. Android and iOS now have this ability and mobile Linux in upcoming devices like the Purism Librem 5, Pine64 PinePhone and Necuno Mobile will need it.

Please consider adding this feature. I know that I will use it in my Librem 5, and I'm sure that others will as well. I don't mind having to give a app permission to access this information, the first time that I open SpyApp. We are going to need authorization of permissions like this if Linux is ever going to become a mobile OS that can compete with Android and iOS.

hadess commented 5 years ago

Please consider adding this feature. I know that I will use it in my Librem 5, and I'm sure that others will as well.

It would probably be implemented in the host side on the Librem so you wouldn't need any portals, so it wouldn't help.

There's also no host API available right now. Somebody should really start with that, the portal can come later.

johan-bjareholt commented 5 years ago

It would probably be implemented in the host side on the Librem so you wouldn't need any portals, so it wouldn't help.

The issue is that they have such an incredible amount of work to do already, it will take years until this feature would become a priority for them and then after that it will take yet more time until it's implemented and shipped. A third-party app could develop this in parallel to Purisms work and work on more than one compositor.

There's also no host API available right now. Somebody should really start with that, the portal can come later.

Wlroots own protocols actually has some of the APIs I'm requesting, I will start prototyping with that tonight and see how it goes. I'm proposing this because there are more compositors out there and I don't want to have to implement a client for every one of them.

EDIT: Here's the result, as sway aparenly didn't implement the protocol (but rootston did, but that's not a WM made for everyday use) I decided to instead use sways socket APIs: https://github.com/ActivityWatch/aw-watcher-sway

ssokolow commented 5 years ago

Bear in mind that it's not only smartphones where that can become a problem. I need this sort of tooling on my desktop PC, so I won't be switching away from X11 until this need is met.

In fact, I wrote some example code to show others how to gather that sort of information under X11.

johan-bjareholt commented 4 years ago

Now GNOME Shell has merged a DBus API for this but for yet another use-case, to be able to use the dogtail tool to automatically test GUI applications. https://gitlab.gnome.org/GNOME/gnome-shell/merge_requests/326/diffs

And phosh (the Librem 5 shell) has implemented the rootston wlr-foreign-toplevel-management protocol https://source.puri.sm/Librem5/phosh/commit/532cfaf085cd440c3f849e92da8c8d65681c2a9c

Would be nice if we could have a solution which worked on both.

matthiasclasen commented 4 years ago

I'm still of the opinion that we don't want sandboxed apps to get into managing foreign toplevels. That is fundamentally a privileged operation

ssokolow commented 4 years ago

Well, the demand isn't going to go away.

If you don't provide some compromise, you're likely to see an analogue to how, because Google refuses to allow things like YouTube downloaders in their extension store, users are becoming desensitized to being walked through the process of enabling side-loading and installing extensions which haven't had any vetting from a trusted third party.

(Or, similarly, how, because virus scanners often report software cracks such as Windows 7 activators as viruses, Windows users grow used to taking the word of random strangers that their virus scanner is giving them a false positive.)

You don't want people giving up all sandboxing on programs X, Y, and Z because the sandbox refuses to meet their needs... especially if it's something read-only like "track active window title", that can be handled through a permissions workflow they're probably already familiar with via the OAuth2-based integrations on sites like Twitter and GitHub.

I'd probably implement such a thing by writing an un-sandboxed daemon which wraps all the disparate APIs offered by various compositors and exposes a consistent API which can be whitelisted in the sandbox manifest... who knows whether I'd get it right from a security standpoint, but I'll do it nonetheless if remaining on X11 becomes unviable before the capabilities I rely on are officially offered. At least that way, I've made what effort I can to sandbox as heavily as is viable.

(UX-wise, my approach would be the aforementioned OAuth2-like approach. Sandboxed client must request an API key, which triggers a permissions prompt from the privileged host. At any time, the user can pull up a list of permission'd applications and modify or revoke permissions. However, I'd also support a "forge" mode like any good Firefox privacy extension.)

TingPing commented 4 years ago

Lets not compare xdg-desktop-portal to restricting itself to "safe" information to Google's store policies...

The flatpak sandbox will always be strict that is the point so every leak of information into the sandbox needs discussion.

We already have a permission system so I'm not sure what you are discussing.

The question is does this information ever belong in a sandbox. If it does how do we ask the user without awful "DO YOU WANT TO ALLOW VAGUE PERMISSION: [YES|NO]"

ssokolow commented 4 years ago

My concern is that, if the sandbox is ill-fitted for what users want their applications to do, it will have an outsized effect on how much sandboxing is actually done.

...and I do agree that vague permissions are something to be avoided. I'm just not sure whether it has to be something that would qualify as "vague" in this instance.

Something like "Read the title of your currently active window and get notified of changes" seems like a pretty clear thing that would be both intuitively obvious if a time-tracking application asked for it and suitably ominous if anything else did. (Though it'd be most useful if that also covered stuff that a human can derive from the title but a machine would have trouble with, such as the window class.)

TingPing commented 4 years ago

OK So I'm still at my question personally. We have one application interested in using this. Anything else to add to this list?:

ssokolow commented 4 years ago

I write tooling of my own (ad-hoc at the moment) which would use it in ways comparable to ActivityWatch.

That's why I said that, if it's not available by the time X11 becomes unfeasible to continue using, I'll hack together some kind of "extra-portals daemon" of my own design which would inherently not be as thoroughly checked over for exploitable mistakes as an official solution.

To be honest, I see this as comparable to the existence of the screenshot portal. Both are very niche things on any desktop where the compositor providers also maintain their own screenshotting solution which could use an internal API or be a compositor plugin.

TingPing commented 4 years ago

The difference with screenshots is those are very user centric actions that just happen one time and can be canceled upon user review of the contents.

This is a more technical detail that will grant unlimited access in the background to potentially sensitive data that the user cannot verify at all nor cancel upon review. For example what if they open the web browser and the window title becomes "Your Name private-email@foo.org login" or something and the user leaked data they never intended to.

Maybe we could limit that information to the app-id of each window (which is not easy to know always) never the window title.

ssokolow commented 4 years ago

The difference with screenshots is those are very user centric actions that just happen one time and can be canceled upon user review of the contents.

This is a more technical detail that will grant unlimited access in the background to potentially sensitive data that the user cannot verify at all nor cancel upon review. For example what if they open the web browser and the window title becomes "Your Name private-email@foo.org login" or something and the user leaked data they never intended to.

In that respect, I was responding to the niche-ness (ie. the short list of known potential consumers with Discord crossed off).

Maybe we could limit that information to the app-id of each window (which is not easy to know always) never the window title.

That would render it effectively useless for my use case.

I (and, I'd assume, ActivityWatch) need to be able to tell the difference between, for example, YouTube and Google Docs (in Firefox) or a masters thesis and a fanfic-writing project (in tools like FocusWriter) or multiple different programming projects (in gVim) or multiple different console applications (via a bit of .bashrc or .zshrc scripting to set the terminal's title to the currently running command name).

In the case of applications without a plugin system like Firefox's, the only way to accomplish that without doing something even worse, like live memory inspection, is to watch the window title, where the currently focused document's name is exposed.

Also, even if that weren't the case, the whole point is to display quantified read-outs to the user, so accessing something "not easy to know always" like the app-id would require some kind of awkward dance such as "Please focus and then type the display name for each application you wish to track".

What about something akin to the warning overlay Firefox displays whenever an application is monitoring the camera or microphone? For example, a tray icon... possibly paired with a notification popup that displays whenever an application starts monitoring and has a "revoke permissions" button. If it annoys me, as a more technical user, I could use KDE's support for forcing tray icons to stay in the menu of inactive icons, and I stay logged in for weeks or months at a time, so I can excuse having to dismiss a notification every time my time-tracker auto-runs on login.

I understand your concerns but, at the end of the day, I worry that this is just a case of real life being messy, just like the "theory vs. reality" situations in academia which inspired that famous Einstein paraphrase, "Everything should be made as simple as possible, but not simpler."

TingPing commented 4 years ago

Honestly I think I convinced myself to be against it. Window titles are sensitive and there simply isn't a way to let a user interject themselves in that process.

Showing a persistent notification or tray isn't supported everywhere and I think users would just be trained to ignore such a thing. Showing when a Camera is active works because that is always privacy invading where as this is only sometimes so and users wouldn't realize it.

TingPing commented 4 years ago

It is OK that not everything will be Flatpak'd. Plenty of system level components cannot be and I think activity monitoring applications may fall under that.

ssokolow commented 4 years ago

I'd still prefer to sandbox everything I can, so I'll probably go ahead with my idea to implement some kind of "unofficial portals" daemon. It'd be a nice place to collect anything that's not official merely because of concern about OAuth2-style "persistent grant" permissions.

luongthanhlam commented 4 years ago

I'm developing a Vietnamese input method for IBus. It uses WM_CLASS to identify what window is active (has focus in), so you can assign it to your favorite input mode (e.g. pre-edit, surrounding text, US input, etc.). My users like this feature very much, but unfortunately it does not work on Wayland. Please bring back the WM_CLASS portal from X11 or something like that to Wayland.

johan-bjareholt commented 4 years ago

ActivityWatch Discord (Lets be honest, it is proprietary and they don't care about Flatpak, I have no reason to believe they would ever support this API)

If we add both a "width" and "height" property I believe it would be possible to add support for dogtail on wayland for more shells than just Gnome Shell (Gnome Shell currently has a DBus API for this)

TingPing commented 4 years ago

@ssokolow There are many ways to sandbox things. For example you can ship a systemd service with sandbox options.

ssokolow commented 4 years ago

I got the impression systemd sandboxing wasn't as well-suited to desktop applications.

Am I mistaken or are you proposing that the monitoring and GUI be separate processes sandboxed using different technologies?

TingPing commented 4 years ago

You could split your daemon into its own process that would be a sandboxed DBus service. Your application could then be a Flatpak with permission to talk to it.

ssokolow commented 4 years ago

That was essentially what I was envisioning for my idea of writing some kind of "extra/unofficial portals" daemon.

DanielJoyce commented 4 years ago

Fed up with Wacom-style-tablet support, I am working on a user-space driver like sc-controlle but for digitize tablets.

Basically treating it as a mass of event sources and making setup even from the cli simple. For brand new tablets, bugs/deficiencies in the wacom/digimend drivers means gnome wacom settings fails to load my tablet template, so there is no easy way to set mappings, even though all the raw event sources appear to work just fine when examined with libinput tools.

The problem is wacom/digimend sometimes misclassifies various bits of hid descriptors, and you get a weird melange of xinputs that don't work, when the low level stuff is there. It would be super easy to walk a user through "Press button 1 on your digitizer, what key press should this send?"

One thing sc-controller provides, and I'd like to have in my user space table tool ( and can do under X11 ) is change button mappings for my digitizer tablet on a per-application basis ( gnome wacom doesn't provide this ). To do that I need information about the active window. If sc-controller is gonna support controller profiles on wayland, it needs this info too.

Some apps don't allow multiple shortcuts for a given action, so it would be awesome to have my tablet system go "Oh, Krita, so I need tablet key 1 to emit FOO key press to zoom in. Oh, now we're using myPaint, I need to emit BAR key press for zoom"

There needs to be some way to get this information, for input methods, usability, and accessability tools.

Tji

TingPing commented 4 years ago

@DanielJoyce Messing with input of other applications seems extremely out of scope for Flatpak/Wayland/etc.

DanielJoyce commented 4 years ago

This is more about getting the window name, creating a uinput device can be managed outside wayland. Per-application behavior for tablets requires knowing the application/window name.

TingPing commented 4 years ago

If your software has permissions to control devices like that it would likely be outside of flatpak already.

jefferyto commented 3 years ago

I use KeePass with its auto-type feature: When I press a system-wide hotkey, KeePass takes the title of the focused window, finds the corresponding username/password and enters this information into the window.

Or at least it use to work. After upgrading to Ubuntu 21.04 and switching the Wayland, neither part (getting the window title, entering key input) works. (I use this primarily to log onto websites and there are plugins that integrate KeePass with my browser, but I prefer to keep the two separate out of security concerns.)

There is a workaround for entering key input using /dev/uinput, but the developers of KeePass are waiting for a common API for getting the window title instead of implementing individual solutions for different compositors.

KeePassXC is also experiencing the same limitations. Auto-type is a very popular feature; the Bitwarden feature request for auto-type is their most-voted feature request for desktop (second most-voted overall).

I hope it is clear this is a legitimate use-case for an application to access window titles.

ssokolow commented 3 years ago

I've also since realized that it's kind of hypocritical to refuse this when I only just accidentally learned that an Open/Save portal's grant of access can be made persistent via the Documents portal and I'm still trying to formulate a Google query which will tell me where that information gets stored and how I can overrule it (with a crontab or incrontab hack if necessary) so that programs don't have the ability to perpetually retain access to paths I Open/Save'd once without so much as a warning or checkbox.

EDIT: ...and KeePassXC is also one of the tools I use which is holding me on X11 for want of a common API for this.

devmattrick commented 2 years ago

To add another application, I'd like to have my keybinds automatically change depending on the active window using key-mapper.

rbreaves commented 2 years ago

@matthiasclasen In my opinion we should provide an accessibility type of level abstraction - similar in fashion to how Apple's macOS handles higher privileged level applications so that additional automation can be carried out. This should be considered a CRITICAL function and feature that needs to be supported (by someone whether it is this or wayland, both, etc).

Also if we want people with handicaps to be able to interact and work with Linux applications at any time in the future while using Wayland then this type of functionality will likely need to be implemented for them as well. We can't be sandboxing everything to the extent that automation, OCR and key remapping solutions such as key-mapper, xkeynsail, keyd and others are rendered useless the moment someone switches from x11 to wayland. We can do better to support situations where this level of access is required for legitimate use.

MacOS handles it by forcing the user to open into an accessibility section or privacy area and then unlock a lock and checkmark the application which is sitting in an area to indicate you will be giving it the level of access which it is requesting. It is a very intentional type of interaction, there is no ability to accidentally grant access.

@ssokolow An unofficial portals deamon for this would work as well!

pktiuk commented 2 years ago

OK So I'm still at my question personally. We have one application interested in using this. Anything else to add to this list?:

* ActivityWatch

* ~Discord~ (Lets be honest, it is proprietary and they don't care about Flatpak, I have no reason to believe they would ever support this API)

You can add AntiMicroX to this list. Getting information about current window is required for auto profile - (switching gamepad profile dependent on currently selected window)

thomasaull commented 2 years ago

I recently wrote myself a small script to automatically change the button mappings on a wacom pen when I‘m im Blender and back. I‘m using xorg currently, where this is rather easy to do and not beeing able to do this under wayland is a bummer :-(

zocker-160 commented 2 years ago

I am the developer of keyboard-center and I would very much need this.

Users need to be able to create application specific profiles and button mappings, which I can only implement when I know what application is currently active.

jefferyto commented 2 years ago

Looping @DReichl (for KeePass), @droidmonkey and @phoerious (for KeePassXC) into the discussion.

superuser-does commented 2 years ago

In reference to password management - I use a utility called Ditto clipboard manager in Windows. Ditto has a feature that means it will not write to disk the passwords you copy, depending on the program you copied from. They will be available in the clipboard at that moment (and therefore in RAM), but won't be saved in the clipboard history on-disk.

As an example, I have set it to not write to disk anything I copy from my password manager.

I'm not certain any Linux clipboard managers support this feature, but it's a category of program that could make use of it.

Silve2611 commented 2 years ago

I also need this feature. There are tons of scenarios where it is necessary. Any application trying to read the screen for example. There is a reason this is possible on windows and on macos. Macos solved it quite good by asking for permission.

If applications needing title and appname will never be allowed on linux with wayland, it will not be real os. Since when did Linux become the system of limited possibilities? Asking for permission is totally fine but not allowing it at all seems like a very strong philosophy for me. It should be up to the users to decide how much an application is allowed to see.

rbreaves commented 2 years ago

Asking for permission is totally fine but not allowing it at all seems like a very strong philosophy for me. It should be up to the users to decide how much an application is allowed to see.

I think it’s more about laziness than philosophy lol.. whether they admit that or not... Just very narrow mindedness on goals w/o much real consideration of how people actually use their computers.. that’d just slow them down & their contributions 😂.

phoerious commented 2 years ago

Thanks for tagging me. I read broadly through the comments. It is indeed absolutely beyond me why this kind of accessibility API is still but an afterthought if anything.

It was mentioned earlier that macOS has something like this already and I would indeed suggest going in the same direction. The first time you use Auto-Type in KeePassXC on macOS, a system popup appears prompting the user to allow access to accessibility and screen recording features (the permissions are badly named, but it works).

Such a prompt on Wayland, when approved by the user, should at least provide access to listing windows on the current desktop and their stacking order and allow for sending keystrokes to other applications. I can imagine a host of other applications that would need elevated accessibility capabilities, from screen recording apps over hotkey automation, UI testing frameworks, custom window tiling managers, to screen readers. Some would even need to read the full window contents, not just the titles. AFAICS, there are already a few "portals" for the most essential things like screen recording and Orca also seems to be getting its data via DBus somehow, but it looks all rather underdeveloped to me. The result is that people implement this kind of feature via udev rules or suid daemons, which nullifies any of the encapsulation provided by Wayland, Flatpak, and the like entirely. In fact, requiring applications to run their own sidecar daemons as root or request unchecked root-level access to raw input devices is far worse than not having any encapsulation at all.

pktiuk commented 2 years ago

This is the most demanded feature on this repo (I think number to :+1: reactions is a good measure)
I think developers should decide what to do with it in a bit more official manner.

For now we know that this feature is important for many types of apps, but there are concerns regarding the way of implementing and ensuring safety.

Anyeos commented 1 year ago

What about a list of IDs as hash? Something that is an ID of the related Window. If you only get a list of IDs there are no leaked information and the purpose of apps that can auto set a profile (like AntiMicroX) can be implemented consistently.

An algorithm that returns always the same ID for the same App/Window combination. And if more information is needed it can "query" that ID to get it.

ssokolow commented 1 year ago

I suppose a "Please type the name of each window you want to track and install a Firefox plugin to intentionally leak the active tab's name (blame the designers of Wayland)" dialog is better than something like "Please run your applications under XWayland (blame the designers of Wayland)" or "Please enable accessibility in order to leak your active window information to everything that wants to claim to be a screen reader" for a time-tracking tool.

Honestly, this feels like a "Requests filesystem=home/filesystem=host for lack of a sufficiently sophisticated File Chooser Portal API" situation.

rbreaves commented 1 year ago

or "Please enable accessibility in order to leak your active window information to everything that wants to claim to be a screen reader" for a time-tracking tool.

A) I think we shouldn't treat our users as though they are idiots. B) Security does matter - but if you make it too onerous then people will reject it and find less secure ways to get around it. C) Accessibility ought to be there for users of all kinds and we shouldn't be excluding them due to some idealistic ideology of a few programmers in the name of security.

If you want to know though Apple has an interesting approach that has an issue when it comes to their accessibility API which is if I set an Applescript to trigger on certain actions via BetterTouchTool it will sometimes cause the OS to think the trigger came from the App that has focus instead of BetterTouchTool so it actually requires the user to then add even more apps to security exclusions... that is actually pretty idiotic in itself. In Apple's attempt to increase security they are not even properly tracking what app is originating a trigger themselves, not sure what the solution ought to be in that case - but my point is that if you are going to implement heightened security then do it in away that still respects the user.

matthiasclasen commented 1 year ago

Also if we want people with handicaps to be able to interact and work with Linux applications at any time in the future while using Wayland then this type of functionality will likely need to be implemented for them as well. We can't be sandboxing everything to the extent that automation, OCR and key remapping solutions such as key-mapper, xkeynsail, keyd and others are rendered useless the moment someone switches from x11 to wayland. We can do better to support situations where this level of access is required for legitimate use.

My opinion is that accessibility tooling is system functionality - it should live on the host side, not on the flatpak side.