Global keyboard shortcut portal

cassidyjames commented 3 years ago

We have received several requests from elementary AppCenter app developers for a way to set (or prompt a user to set) a global keyboard shortcut to launch their app or a specific feature of their app (e.g. an Application action).

Pre-sandbox, these developers would set the GSettings for the desktop itself to add their app to the global keyboard shortcuts. Obviously this is non-ideal and only reasonably worked because we human reviewed every app’s source upon submission and update.

Today in Flatpaks, we recommend developers direct users to System Settings > Keyboard > Shortcuts and tell users how to add a custom shortcut manually.

Ideally, however, apps could use a Portal to request a system-wide shortcut (along with a description/rationale of the feature), and then we could provide a UI to display the request, let the user pick a shortcut, avoid conflicts, etc.

vchernin commented 3 years ago

This also sounds like something particularly useful on Wayland, since apps can't arbitrarily access keyboard input whenever they want (no keyboard event polling). So traditional app-provided global shortcuts won't work on Wayland. On Wayland you'd need the Compositor to "capture" global shortcut usage on behalf of the app, but today that isn't very easy or accesible to manage.

Having a standard way of asking the host system/compositor to provide a global shortcut for the app sounds like a very clean solution.

matthiasclasen commented 3 years ago

An api proposal (or even better: an implementation) would be appreciated.

k3d3 commented 3 years ago

This would definitely be necessary for things that have core functionality in global hotkeys, for example Mumble or Discord with push-to-talk.

Is the xdg-desktop-portal project open to API proposals for this, and/or somewhere to discuss and flesh out what an API would look like?

rohmishra commented 2 years ago

I can write a rough draft of portal API proposal for this feature if anyone is interested, I do have a few (optional) features that I wanted to include:

Common/generic shortcuts (eg. Push-to-Talk, a list of common shortcuts that are shared between apps. The first app to engage the shortcut gets to use it, Eg. if you are using a communication app, the app requests access to PTT and whichever key you have defined as PTT is used transparently. others can request a temporary shortcut for single use case, see. 3)
Ability for the app to get the keyboard shortcut you decided to use. (useful if app wants to display the shortcuts in the settings menu.)
Temporary shortcuts (are cleared when you exit the app. Useful for features users dont use much or are known to be rarely used or when primary key (if a common key) is used by a different app)

Let me know if anyone is would be interested in these optional features. If so, I can include them in the draft as well.

cassidyjames commented 2 years ago

@rohmishra in my mind when proposing this idea, I was really thinking of limiting the scope to launching apps and FreeDesktop Actions, which as far as I know would cover most if not all of the uses by elementary AppCenter apps.

I feel like PTT is different in that it assumes the app is running and able to intercept/handle a specific key internally; are there other examples of these generic shortcuts? But off the top of my head I am not sure this is a good fit without inventing yet another standard of types of keyboard shortcuts. I don't feel strongly, I just fail to see the other potential uses and am not entirely convinced it's a good fit.

I think 2 makes sense and would be really convenient for in-app education!

I'm not sure I understand the use of 3.

rohmishra commented 2 years ago

For one, push to talk and apps that display an overlay/menu when you press and hold apps are the only two use cases I can think of right now. I did have a different one in mind a few days ago but I failed to write it down 😅 so it's lost to history I guess.

That said, I do have a different use case in mind that I glossed over previously - for PTT the app needs to know if the keys are currently pressed or not. Do we want to support that use case? I feel like with the ubiquity of video conferencing these days, there might be a lot of request to support this regardless. In that case, Do we want that to be a seperate API?

As for the third one, the shortcuts are automatically cleared when you sign out (or quit the app, not really sure which one would be better). PTT actions are probably a good example of its use case. But a UI that dynamically changes a lot might wanna use this too. I personally don't care either way but just thought this might be something someone might be looking for. It was a valid feature to impliment on x after all.

Except for the PTT/overlay case, I do think most use cases can be solved by the current shortcut system of just executing a command with a flag/dbus-send to let the app know of an action. It would certainly be easier as except for a portal to trigger the add shortcut UI automatically and pre-filling the command, we already have everything else in place.

I'm on mobile right now, but GSConnect can trigger the action in its own UI, so unless it uses a weird hack, gnome already has most of the stuff in place.

Without 1 & 3, I have a simple structure in mind:

We store an invisible to user token for each app that has requested a global shortcut. The app can use that to request current key-binding, or clear it. We may also (optionally) consider passing it back to the app in argv/dbus so that if the app allows users to have some custom action on keypress (eg run a series of user defined actions or enter a custom text) it can simply identify against the unique token/UUID.

jadahl commented 2 years ago

PTT seems to me like one of the more common requests when it comes to global key bindings, and I imagine it cannot rely on executables or .desktop action entries since one would need both a start and a stop signal.

Would also be nice te know what actions exactly tends to be bound to global keys in Elementary Apps too to better understand what they are actually trying to solve.

As for storage in the permission store, I think it'd be good to take inspiration from the screen cast session storage method here; it solved a very similar issue; e.g. letting the portal backend provide the actual stored content, while using a x-d-p provided token to make it possible to restore.

1player commented 2 years ago

I have a use case for options 2 and 3. I am working on an application to calibrate mouse sensitivity between games. Without entering into technical details, you'd start the app, enter calibration mode, tab over to the video game and press a hotkey to start recording the mouse input and another hotkey to stop.

So my application would need a way to configuring a few global shortcuts that are active as long as it is running, and I'd like to make those hotkeys configurable by the user.

Since the app requires raw access to /dev/input/event*, I can already simulate some kind of global shortcut system, but I would prefer to use the actual portal, if there's one available.

rohmishra commented 2 years ago

@jadahl PTT would require sending key-up and key-down flags as distinct signals which isn't something other apps wanna deal with and something we want to avoid sharing unless absolutely necessary.

And yes I was thinking of similar structure as screencast API. I was about to sleep while writing the previous comment and it shows.

@1player the plan is that apps won't be able to force a keyboard shortcut, rather the user gets to choose them. I'm thinking of allowing apps to suggest keybindings to be helpful but those still won't be binding on the user. That's why i want to include a way to read binding so that apps can dynamically update their documentation and help pages to reflect the right shortcuts.

Also we might actually want to explore hiding /dev/input behind a portal cause otherwise it defeats the purpose of apps can read keyboard anyways by adding a single line to their manifest. Permissions granted by the manifest are automatically granted when running the app and users shouldn't be expected to review them for every app.

jadahl commented 2 years ago

@jadahl PTT would require sending key-up and key-down flags as distinct signals which isn't something other apps wanna deal with and something we want to avoid sharing unless absolutely necessary.

True, but done right, they (the ones not caring about start/stop) could just ignore all but the start signal for example.

rohmishra commented 2 years ago

It would result in inconsistent behavior. Some apps react to key-up, others do to key-down. It is also a more complex API that would require more time to implement. You loose the ability for shortcuts to work even when the app isn't running yet (something other platforms don't really do and would result in a better experience if we can land that! We don't want users to have to check if the app is running yet or not.)

Having PTT tied to session only bindings may be a good idea too. The app can request key-binding the first time you enter a call for the day. Paired with suggested bindings it should be a one extra click/enter-key once a day/restart affair for most people. That said, people are not going to like it. But I don't think all bindings receiving key-down and key-up is a good idea.

jadahl commented 2 years ago

An alternative is for applications to ask for what type of event to receive, i.e. start/stop/cancel vs triggered. Anything that doesn't actually want the former would just receive the latter.

What makes PTT special in that it shouldn't be able to be saved more persistently?

k3d3 commented 2 years ago

I like your ideas @rohmishra , however I'd like to make one comment on this:

The first app to engage the shortcut gets to use it

I believe it should be possible for apps to simultaneously register the same keyboard shortcut. For example, I use both Mumble and Discord - I never use the voice capabilities of each at the same time, however I don't want to have to register different PTT shortcuts for each.

If I were a bit more insane and had friends scattered across Mumble, Discord, Ventrilo, Teamspeak, and Teams servers (granted I have no idea if Teams even does PTT), it would become very annoying very quickly to have to remember each one's individual PTT key depending on which app I'm using. Having 5 different push-to-talk keys could get unwieldy quickly.

Also, one comment on the idea of "temporary global shortcuts" - I like the idea of having them be unregistered after an app exits, however does this mean you will be prompted for the keyboard shortcut every time the app is opened? That could get annoying very quickly and might discourage people from using that functionality.

What if instead, you required that the prompt said something like "Application X would like to register key Y as a global shortcut", and then you could answer it with

Allow once
Allow forever
Deny

or something like that? I think that would solve the use cases for temporary shortcuts.

(As a bonus, it would be interesting to allow the user to change "key Y" from the prompt to something different, but not really necessary for an initial design.)

k3d3 commented 2 years ago

The only thing I'm not sure about is how annoying a prompt like that would be if an app wants to register several shortcuts. As a bad example, a screenshot tool that registers one key for fullscreen, one key for current window, and one key for selecting a rectangle. If a user only wants those shortcuts registered temporarily, they'd have to answer three prompts every time they open the program.

One solution to this would be to allow an app to register multiple shortcuts with a single prompt, but that gets rid of the option to allow some shortcuts but not others (at least, not without making the prompt quite a bit more complicated). I'm not a big fan of this, but I'm not wholly against it either - an app registering multiple shortcuts might be enough of an edge case that it's not worth going this route.

Any thoughts?

rohmishra commented 2 years ago

@k3d3 that's the idea. All apps request access to "common-PTT-key" or something like that, and the first app that actually uses it gets to use it in call while others can request a temporary alternative. So for example if you are on a discord call and your PTT combo is ctrl+space, discord PaTT will be triggered by that, but if you open teams for a call WHILE on a call in discord, it will ask you for a temporary alternative.

Also, that is the idea behind temporary shortcuts. User will always be in control. The app can just ask for the shortcut to be temporarily assigned, or the use can force it on the assignment screen.

Temporary shortcuts are intended for apps that you know you don't use often or as discussed above, for apps that are meant to be used once/rarely so you don't want to pollute your shortcuts with them. They are optional.

Allow once doesn't really make sense so it's better to just skip that to minimise complexity.

And yes, the user gets to decide the key shortcut as mentioned above. Apps can purely just SUGGEST shortcuts to simplify the flow, not enforce them.

k3d3 commented 2 years ago

And yes, the user gets to decide the key shortcut as mentioned above. Apps can purely just SUGGEST shortcuts to simplify the flow, not enforce them.

Agreed - so long as the UX isn't too complex, that works. User choice is better.

Allow once doesn't really make sense so it's better to just skip that to minimise complexity.

What about it doesn't make sense? I feel like it would keep the API much simpler for an app to just say "register a key, preferably key X" and that's it, then let the user decide if that shortcut should be allowed forever (permanently) or once (temporarily) - the naming could certainly change if the terminology is what's bothering you. I think it's better for the user to decide if an app is commonly or uncommonly used, rather than the app developer having to guess.

So for example if you are on a discord call and your PTT combo is ctrl+space, discord PaTT will be triggered by that, but if you open teams for a call WHILE on a call in discord, it will ask you for a temporary alternative.

I have mixed opinions about this. While I do understand what you're getting at, this ends up being a bit of the same problem where now I have to remember multiple PTT keys. That said, I kinda like the idea of having a warning prompt when a keyboard shortcut is already in use.

What if, instead of only asking for a temporary alternative, it asked for a temporary alternative but also gave the option to use the same key?

swick commented 2 years ago

My two cents: the request for global shortcuts should be as descriptive as possible making different solutions for the backend possible.

One request should be a list of all shortcuts the application wants to set, each shortcut consists of:

internal identifier
human readable description
type of event (start/stop/cancel vs triggered vs dbus activation)
binding suggestions (keyboard, touch gestures, … whatever we will need eventually)

That allows a backend to implement a UI where it's possible to:

bind or not bind each shortcut for one application
choose what action to bind to the shortcut
prevent the user from binding a shortcut to an action which is already used by the shell
show the other apps which are using the same action

For persistence we can reuse the screen cast session storage method like @jadahl suggested. If an action is bound to multiple shortcuts they all get triggered. This should be sufficient for all the use cases listed in this thread and give the user enough control.

e: a note about event types: dbus activation is the only type which can be triggered while the application is not running (but also when running).

k3d3 commented 2 years ago

@swick completely agreed, I think. I'm just curious what you mean by "type of event" - is this something that would be shown to and selected by the user, or just handled by an app via the desktop-portal API?

I'm not sure if that's all that important to show to a user (so long as it shows which key, just not necessarily if it's start/stop or just triggered), though I may be misunderstanding what you mean there. Also if an app expects a start/stop PTT-style binding and a user chooses triggered, that might break things.

Also since this is xdg-desktop-portal, wouldn't everything behind the scenes be handled by dbus?

Other than that, yeah, I like that a lot!

swick commented 2 years ago

The type of event is an implementation detail that cannot be overwritten by the user. It can influence what action can be bound to it. For example a PTT can not be bound to a touch gesture.

Also since this is xdg-desktop-portal, wouldn't everything behind the scenes be handled by dbus?

Yes. I think you're confused because of the dbus activation type? The dbus activation mechanism allows shortcuts to be activated even if the application is not running (https://specifications.freedesktop.org/desktop-entry-spec/1.1/ar01s07.html).

k3d3 commented 2 years ago

Okay, thanks for the clarification. I completely understand and fully agree now. :)

swick commented 2 years ago

One more thought: we might not even need a session storage mechanism if we send the events to a well-known dbus service just like with dbus activation. The portal would then basically only be responsible for configuring the shortcuts.

cassidyjames commented 2 years ago

I think I'll stay out of the PTT discussion as that's just not the problem I was looking to solve (and not one I've put much thought into), but I trust you all to come to a solution. :)

Would also be nice te know what actions exactly tends to be bound to global keys in Elementary Apps too to better understand what they are actually trying to solve.

Sure! The two types of shortcuts that were in use pre-Flatpak that have been requested when an app is not in the foreground are just simply launching the app, and launching a specific mode of the app. The latter could be handled e.g. by passing a CLI flag, but for user presentation reasons, I think it makes sense to lean on the existing FreeDesktop additional application actions as these are in use today across FreeDesktop apps and desktops, and give us built-in niceties like translatable human-readable names for features, icons, and the actual command to be executed. It would mean not duplicating these actions across multiple places and encourage their continued use for desktop interoperability.

Specific examples

These two apps exemplify the two categories of requests we're getting from developers:

Clips

Clips screenshot

Clips is a rich graphical clipboard manager; it launches with a view of your recently copied items to help you recall and paste them. Since this is the whole function of the app, opening it from your applications launcher, dock, etc. always launches into this view, so they are just requesting a more streamlined way to configure a desktop-wide shortcut that would launch their app.

Currently, they use in-app messaging and a settings:// link on elementary OS to point the user in the right direction of configuring desktop keyboard shortcuts, but it requires more manual work from the user and is not cross-desktop friendly.

Planner

Planner is a to-do planner with a very full-featured UI, but it has a "quick add" feature intended to be launched from anywhere in your OS when an idea strikes you:

rtmrbs59xvi41

It is currently implemented by manually writing a custom command to the custom keyboard shortcut GSettings which does not work reliably across desktops and in a Flatpak (besides requiring a big Flatpak sandbox hole). It could be implemented as a .desktop action to be accessible from the app launcher in addition to this system-wide keyboard shortcut.

jadahl commented 2 years ago

Thanks a lot @cassidyjames, that's really useful information!

rohmishra commented 2 years ago

I think PTT/hold-for-action is something that might need further exploration. It is certainly a feature that we would want to have even if it is just for two use cases, one of which is rather niche.

For now we should just focus on just automating the existing global shortcuts methods that we have - allowing apps to request shortcut and set the command for execution. PTT is something that can be added in later probably.

For now call apps can work around this limitation by allowing a command to both enable and disable (toggle) mic. Not the most elegant solution but it is workable

Here is something we might want / what I have in mind:

registerShortcut(cb, suggestedKey) - returns token if key is set or 'null'. cb: command to execute (or dbus-send action).
getShortcutKey(token) - returns the assigned key combination to display in-app
unregisterShortcut(token) - returns boolean true or false to describe if action was successful.

We also might want to check and maybe limit what command or dbus the app is calling and limit it to itself or warn users if it is trying to set the shortcut to something else.

slagiewka commented 2 years ago

I see that this discussion is revolving around keyboard shortcuts. I've come across it while trying to see what can be used for projects that want to send key presses. An example is https://github.com/keepassxreboot/keepassxc/issues/2281. Would this be connected or such functionality would require another API?

swick commented 2 years ago

The issue you linked to should be solved with something similar to androids autofill feature. For emulating input libei is what you want to look at.

aleixpol commented 2 years ago

We Plasma have also been looking into this problem and we mostly ran into very similar conclusions as the conversation in here before we reached this thread.

We need:

desktop-entry-spec keyboard-activatable actions
Push-To-Talk

The as you've all mentioned repeatedly, Push-To-Talk can't easily be addressed within the former point as we'd need handling the release.

Something we discussed too was the possibility of tackling Push-To-Talk using a dbus service like we do for MPRIS where applications get to implement an interface stating their information and expectations and another process (the compositor in this case, presumably) would handle the logic.

If you think that addressing either is something you are interested in, we can provide a proposal to discuss.

jadahl commented 2 years ago

Leaving out Push-To-Talk out of the keyboard shortcuts is perhaps the best idea; it'll make the keyboard-release issue less urgent, if needed at all, and with a more aware (about Push-To-Talk) interface, the system providing it can be more clever and e.g. force-mute the microphone when the button/key isn't pushed.

If separate, it should probably still be a portal, and not its own separate interface, so that it can more easily integrate with the permission store, have a libportal API etc.

rohmishra commented 2 years ago

I second what @jadahl said for now.

Limiting keyboard shortcuts to just invoking an action would solve most problems outside of PTT.

PTT also seems to be the only use case that may need implementation and should probably fit better with a call management portal/API/dbus-interface similar to what we have for MPRIS and would also provide a much more rounded and "integrated" experience for the users. The API would also be useful for use on phone/tablet environments like with phosh. CallKit for apple environment and android.telecom on android would be similar to what im referring to but thats a discussion not meant for this thread.

This leaves non call related global press and hold actions such as game-launcher or stats overlays.

thecookie94 commented 2 years ago

One (albeit rather rare) usecase that I would want to add is that we shouldn't forget about legacy application support&people who have extra programmable keyboards/shortcut pads that support unicode input (hence much more keys that one can use for global shortcuts w/o interfering with normal input). The thought about legacy support I had is that there's maybe a possibility to forward keycodes to out of focus applications (user configuration via a compositor level shortcut config). Something along the lines of: "When I press X send keycode y to application/window z". Global arbitrary shortcut support is the only reason as for why I'm still on xorg.

rohmishra commented 2 years ago

I'm afraid I'm not familiar with this use case @thecookie94

Just so that we understand the use case you are proposing is:

User presses a button on keyboard (likely a macro key)
The system receives a trigger and then issues a keystroke (or combination of multiple)
The intended application receives these combination strokes

I would assume this should already be possible by leveraging libei (something like autohotkey)

Bind a keystroke to the macro key app
On receiving this action from macro, the macro key app issues programmed keystroke using libei
The intended app receives the desired keystroke

Moreover, libei is already compatible with flatpaks if I'm not wrong.

This API though would certainly help with automating the first part and helping users set up binding within the app.

Please let me know if i'm misunderstanding the issue at hand here though.

thecookie94 commented 2 years ago

Ah yup, leveraging libei sounds perfect for that usecase&that's exactly how I meant it to be understood. It really seems like the main issue is that people simply don't know about it/don't know how to set it up (hence integration in the compositor level shortcuts application would be beneficial). Binding the desired keycode to the key (on a hardware level) is already done in a keyboard configuration program like vial (for QMK/vial based keyboards), and the keycodes the application listens for are set up in the application itself. The only thing that's rather intransparent/user unfriendly is setting up the intermediate step (setting up the keyboard input to desired application&emulated keycode forwarding/translation using libei); hence my proposal to integrate that functionality into the global shortcut portal. Please let me know if you have any more thoughts on that.

thecookie94 commented 2 years ago

The compositor specific settings should be in something like the shortcuts panel in the plasma settings, benefiting legacy (thus portal unaware) application&the user by being able to see what shortcuts are set up&forwarded to which application. But that's sth that I would see as beneficial for the global shortcut portal as a whole, not just this specific legacy (using libei) application.

rohmishra commented 2 years ago

@thecookie94 since this would require integration by all compositors, i fear this is something that may not be plausible integrated into shell. smaller and lighter WMs and Gnone (gdm) may find it to be difficult and unreasonably complex for the niche that it serves given it is possible to do so inside of a single app that can be used on any environment.

There are two possible routes an application can go in this case:

Since this API would require shortcuts to be configurable by user - If you wanted to replicate a global shortcut you can skip the intermediate step and directly bind the key to the action.
For shortcuts inside of a app, a app like AHK for linux would probably be desirable in long term for both automation and macro key bindings.

thecookie94 commented 2 years ago

@rohmishra Integration into the shortcut panel is more of a "would be nice" thing, rather than a must have; a more general shortcut configuration&compositor agnostic "shortcut manager" UI would definitely be beneficial in that case. As for 1: I'm afraid that might not be possible with applications that just react to keyboard input&aren't portal-aware, hence requiring the "shortcut manager" in the first place (if I understood that correctly, that is). As for 2: interactions that can't be bound in the application-specific hotkey manager/are prebound are out of the scope of my proposal, but it would be something that would enter the realm of possibility with the portal (by giving "linux ahk" permission to interact with the specific application/window)

thecookie94 commented 2 years ago

Or is the first step you meant the "bind keycode in hardware" one? Cause yeah, that step is optional (but that's a step that people who own such a programmable input device will take)&has nothing to do with libei/legacy application global hotkeys in and of themselves.

Edit: I guess the whole point behind my proposal is: How do we not break legacy applications that expect to be able to behave like a keylogger without giving em permission to be a keylogger (duh), but rather by attaching a virtual input source to the specific application/window/client, which we then can control using the shortcut portal. That pretty much sums it up.

ssokolow commented 2 years ago

A few concerns for this sort of thing:

Any solution that meets my needs must have as little latency as feasibly possible. (I say this as someone who tends to load down his systems such that fork/exec-based keybinding gets noticeably sluggish)
On X11, I use XGrabKey because I refuse to implement a separate backend for every desktop.
Given that I tend to use things that register half a dozen or more global hotkeys (eg. I'm still on the GTK version of Audacious Media Player because, last I checked, global hotkeys were listed as not-yet-reimplemented for the Qt UI and I use Win+... for all interaction), the current state of global keybinding in Wayland has been one of the things keeping me on X11 since I don't consider it acceptable to have to manually bind everything.
It's always felt like a massive wart in the ecosystem, even in the heart of the X11 era, that you have to write a KDE-specific backend and use KDE to get a nice cross-sectional "all global hotkeys on the system" control panel capable of identifying potential collisions and the like.

For example, here's how just my media playback is configured under X11:

Using Audacious's internal keybinding support:

Win+← - Previous Song
Win+→ - Next Song
Win+J - Show "Jump to Song" dialog
Win+KP_Add - Volume Up
Win+KP_Subtract - Volume Down
Win+Pause - Play/Pause
Win+Print Screen - Show/Hide Audacious Window

Keys using xbindkeys and D-Bus which I'm planning to write a resident XGrabKey-to-DBus mapper to reduce latency on:

Win+↑ - Jump to first song and start playback if stopped/paused
Win+↓ - Jump to last song and start playback if stopped/paused
Win+o - Set "stop after current song" and start playback if stopped/paused (Once)
Win+s - Skip 55 seconds forward to skip podcast sponsor notice

...plus equivalent esoteric keysyms exposed by my ATi Remote Wonder II which roughly double the number of keybindings from this one program, half of which were defined by Audacious's global hotkey plugin by default because they're things like XF86AudioPlay. (The kernel driver for the ATi Remote Wonder II exposes it as a keyboard and mouse rather than going through LIRC.)

I only ever use the Audacious window for playlist management and rely on the OSD+Keybindings combo so much that my typical approach for "What was this song again?" is to double-tap pause to bring up the on-state-change OSD.

Latency is unacceptable for these hotkeys because:

Every millisecond extra that Win+S is another millisecond I have to be irritated by a mid-roll podcast ad
When I press Win+← or Win+→, I'm likely going to be pressing it multiple times until I find a song I feel like, and I already have to deal with Audacious not having a plugin to skip leading silence in an audio file.
It just makes my whole system feel slower if a hotkey that's supposed to invoke a dialog doesn't result in it appearing immediately, and that's a psychological drain.

Other global hotkeys for background applications that I'm not willing to give up or add unnecessary latency to include:

Win+P - KeePassXC auto-type (Fill Password)
Win+Z - Toggle Zeal visibility and focus search field on show
Some other functionality that depends on other not-yet-in-Wayland things, like watching changes to the active window title to log what I'm doing for time-management purposes. (Given that I have problems with executive dysfunction, does this count as an assistive technology?)

ssokolow commented 2 years ago

Oh, another concern. If D-Bus Activation or a similar technology is chosen, please rely as little as possible on an application being specifically installed and integrated into the desktop.

I don't want a repeat of the Unity DBus Launcher API, where an Appimage or self-contained Rust application or anything else that can be uninstalled simply by deleting a file/folder can't integrate with KDE Plasma's ability to display a progress indicator in a taskbar entry because the API associates the taskbar button with the application window via a .desktop file looked up in the system search path.

(In fact, I suspect that's why I see so few applications using the features offered by that particular API. It's a design that is disproportionately inconvenient for people who run their applications out of a git clone without installing them into the larger system... like the developers who write said applications.)

Activation is fine as an option, but shouldn't constrain what's possible for applications that don't need it.

aleixpol commented 2 years ago

I've been thinking about the topic over the last few days and I would like to suggest offering an API such as this:

<node name="/" xmlns:doc="http://www.freedesktop.org/dbus/1.0/doc.dtd">
  <interface name="org.freedesktop.portal.GlobalShortcuts">
    <method name="registered">
      <arg type="as" name="results" direction="out"/>
    </method>

    <method name="trigger_description">
      <arg type="a" name="action_name" direction="in"/>
      <arg type="a" name="trigger_description" direction="out"/>
    </method>

    <method name="register">
      <arg type="a" name="action_name" direction="in"/>
    </method>

    <method name="unregister">
      <arg type="a" name="action_name" direction="in"/>
    </method>

    <signal name="registered">
      <arg type="s" name="action_name" direction="out"/>
      <arg type="s" name="trigger_description" direction="out"/>
    </signal>

    <signal name="activated">
      <arg type="s" name="action_name" direction="out"/>
    </signal>

    <signal name="deactivated">
      <arg type="s" name="action_name" direction="out"/>
    </signal>
  </interface>
</node>

This would allow us to:

Have a standard API for global shortcuts
Have an API that contemplates press and release, so PTT can be implemented just as is. It's still possible to do something ad-hoc for it.
Offer an API that doesn't necessarily need to prompt outside of the app's UX it's integrating but it still can.
Allow the XDP implementation to decide what to do with shortcut clashes, decide if a shortcut is acceptable or not.
Use this for other input devices besides keyboard

Beyond this implementation, we could consider standardising Application actions. This is something we have been doing in Plasma (under X-KDE-Shortcuts) already and it could make sense to generalise. I don't think we have a standard language for keyboard shortcuts so I guess that would be necessary first, especially if we want to go beyond keyboard (e.g. like remote controllers/evdev).

jadahl commented 2 years ago

Could you explain how you envision these API methods to work? The (de)activated ones are self explanatory, but what does the registered and trigger_description methods do?

Is the intention that the application would call one method for "describing" an action, then register it, then "regitsered" it? Not sure I understand the flow here.

My thoughts about how it could look is to perhaps instead of multiple methods that construct the final intention of the application, we could instead have a single Register method that takes a a(a{sv}) and returns a org.freedesktop.portal.Request. The input data would be an array of requests, where each request is a vararg that contains all relevant metadata, such as what kind of binding this is (signal or action), the application action name, a human readable description, and other hints. This request method call the org.freedesktop.impl.portal.GlobalShortucts with the request handle (see org.freedesktop.portal.Request and org.freedesktop.impl.portal.Request) and would thus carry all the information needed by the portal backend, allowing it to make decisions whether to show any UI or not.

When a shortcut was granted, a response signal on the request would be emitted, and following that, the activated/deactivated signals would be emitted, assuming the shortcut was not about .desktop actions.

aleixpol commented 2 years ago

I guess I should have been more verbose on my explanation.

The description would be the shortcut itself, subjective to the implementation. It could be Ctrl+A, Control+A or up+up+down+down+left+right+left+right. It's what the backend implementation recorded as the trigger. This way the applications can render it for users to see what is happening without having to track it.

This is what the client UI looks right now for Discord right now. In KDE we have a similar component as well from KGlobalAccel.

Screenshot_20220210_133640

It could make sense to add some text to explain what the shortcut is for in the DE UIs as well.

jadahl commented 2 years ago

If the description is the actual bound trigger, should it be an arbitrary descriptive string, or something formalized? If the application should be able to provide hints for what keyboard combination it would prefer, I assume we need to formalize it, at least for the hint, if not the description.

GeorgesStavracas commented 2 years ago

Do you think it makes sense to use the "session" pattern that other portals use here? Each application would open a global hotkey session, register shortcuts, and be closed when the app is closed (or whenever the app wants to).

In this case, the D-Bus interface would need a CreateSession() method, which would expose a session object on D-Bus. This session object then is what would contain most of what @aleixpol mentioned above.

jadahl commented 2 years ago

I had this in a draft comment just as I saw Georges comment :P Another thing to consider is whether to use org.freedesktop.portal.Session for signal emission. See for example how org.freedesktop.portal.Location uses it.

aleixpol commented 2 years ago

If I understood your idea correctly, it would look like this:

<node name="/" xmlns:doc="http://www.freedesktop.org/dbus/1.0/doc.dtd">
  <!--
      org.freedesktop.impl.portal.GlobalShortcut:
      @short_description: GlobalShortcut portal backend interface

      This this portal lets applications register global shortcuts so they can
      act regardless of the system state upon an input event.
  -->
  <interface name="org.freedesktop.impl.portal.GlobalShortcuts">
    <method name="CreateSession">
      <arg type="a{sv}" name="options" direction="in"/>
      <arg type="o" name="session_handle" direction="out"/>
    </method>

    <method name="Shortcuts">
      <arg type="o" name="session_handle" direction="in"/>
      <arg type="as" name="results" direction="out"/>
    </method>

    <method name="Register">
      <arg type="o" name="session_handle" direction="in"/>
      <arg type="s" name="shortcut_handle" direction="in"/>
      <arg type="a" name="name" direction="in"/>
    </method>

    <method name="Unregister">
      <arg type="o" name="session_handle" direction="in"/>
      <arg type="s" name="shortcut_handle" direction="in"/>
    </method>

    <method name="Description">
      <arg type="o" name="session_handle" direction="in"/>
      <arg type="s" name="shortcut_handle" direction="in"/>
      <arg type="a" name="name" direction="in"/>
      <arg type="b" name="found" direction="out"/>
      <arg type="a" name="description" direction="out"/>
    </method>

    <signal name="Registered">
      <arg type="o" name="session_handle" direction="in"/>
      <arg type="s" name="shortcut_handle" direction="in"/>
      <arg type="s" name="name" direction="out"/>
      <arg type="s" name="description" direction="out"/>
      <arg type="b" name="successful" direction="out"/>
    </signal>

    <signal name="Activated">
      <arg type="o" name="session_handle" direction="in"/>
      <arg type="s" name="shortcut_handle" direction="in"/>
      <arg type="s" name="name" direction="out"/>
    </signal>

    <signal name="Deactivated">
      <arg type="o" name="session_handle" direction="in"/>
      <arg type="s" name="shortcut_handle" direction="in"/>
      <arg type="s" name="name" direction="out"/>
    </signal>

    <property name="version" type="u" access="read"/>
  </interface>
</node>

The big advantage here would be that we can have different frameworks within an app create a session and keep them somewhat separate. Not sure if it's very useful but if it's a pattern already used elsewhere we may as well use it here.

Closing the session can happen through org.freedesktop.portal.Session indeed.

I'll put together a PoC in this direction and eventually a PR.

swick commented 2 years ago

This still registers shortcuts one by one so the backend has no complete view it can actually show to users. The metadata is a single string describing keyboard shortcut which means no human readable description, no other input methods, no d-bus activation, ….

aleixpol commented 2 years ago

This still registers shortcuts one by one so the backend has no complete view it can actually show to users.

The method Shortcuts is meant to do that.

The metadata is a single string describing keyboard shortcut which means no human readable description,

I was thinking the description string should be provided as user-readable. There's no use for the app to understand what the trigger is.

no other input methods

Why not? The description/trigger can be:

"Ctrl+D"
"Pedal 1"
"Head bangs into the table"
"Banana enters the fridge"

It's up to the portal implementation to define what events it supports and how to deliver them.

no d-bus activation, ….

Yes. Correct me if I'm wrong, but I think this goes beyond the scope of what we are doing here. We are dealing about telling an app that is already running about an event. DBus activation is interesting because it delivers events to other apps.

As Cassidy suggested above, this could be done using Desktop File Actions. In fact, in KDE we are already doing that (under X-KDE-Shortcuts) and we'd be completely open to having it standardised, but the kind of work necessary to do this other different feature is not really overlapping with the work here. Unless I am missing something important.

swick commented 2 years ago

So apparently I completely didn't get the flow. The thing that actually registers the shortcuts is called Shortcuts and the thing to create a shortcut is called Register. Am I the only one who finds that very backwards?

The description I meant expands a bit on the name (e.g. name: "Record", description: "Starts the screenrecording"). What you mean with description is about telling the app that the shortcut was bound to a specific trigger. Maybe just call it that: bound_trigger. How does it look if it is bound to multiple triggers?

The single string as a hint for the trigger seems limiting. We have to make sure that "Head bangs into the table" is not parsed as a key combination.

It's also not expandable if we want to support "start/stop/cancel" additionally to what I guess is just "triggered" now.

What if an application updated the shortcuts it supports? With this design they will always have to remember which shortcuts were available at some point in time to Unregister them in the new version. If we instead say that all shortcuts which are not mentioned when you call Shortcuts get removed that would solve the problem. The Unregister should get removed then.

(Ignoring the d-bus activation thing for now)

aleixpol commented 2 years ago

So apparently I completely didn't get the flow. The thing that actually registers the shortcuts is called Shortcuts and the thing to create a shortcut is called Register. Am I the only one who finds that very backwards?

Shortcuts lists what shortcuts are available. Register adds a shortcut. I've produced a PR with the proposal which has the xml files populated with documentation. Hopefully it appears saner.

The description I meant expands a bit on the name (e.g. name: "Record", description: "Starts the screenrecording").

Do we plan on having system UI to list the shortcuts and allow to configure them? If so, yes, we should add a user-facing description of what it does, the shortcut_name as it is specified now is an id and should remain stable. We don't want to de-configure a system when it changes language.

What you mean with description is about telling the app that the shortcut was bound to a specific trigger. Maybe just call it that: bound_trigger. How does it look if it is bound to multiple triggers?

As it is suggested here, a global shortcut has a trigger that has been entered by the user. So 1. We could provide further API to extend the Global Shortcut if we want it to do more things. I'm open to it.

The single string as a hint for the trigger seems limiting. We have to make sure that "Head bangs into the table" is not parsed as a key combination.

Ideally nobody should be parsing. The system knows what it does and it offers. The string is there for the app to present to the user, as shown above.

It's also not expandable if we want to support "start/stop/cancel" additionally to what I guess is just "triggered" now.

Agreed. We could add a boolean enabled state for registered shortcuts. :heavy_check_mark:

What if an application updated the shortcuts it supports? With this design they will always have to remember which shortcuts were available at some point in time to Unregister them in the new version. If we instead say that all shortcuts which are not mentioned when you call Shortcuts get removed that would solve the problem. The Unregister should get removed then.

I think I'm starting to see where the miscommunication went. As it is, presented right now in the PR, the app needs to request each Global Shortcut 1 by 1 and it's during the registering that the user will be entering which shortcut it is in system's terms, be it through UI presented by the portal interface or because the system listens to what the user does and then registers the action accordingly.

As it is, an application would need to unregister older now unused shortcuts, which I agree is unlikely to happen and error-prone.

It could make sense to have the app tell XDP all the shortcuts it's going to be using at CreateSession then only operate under this set of actions, discarding those that were not mentioned when the session was created. :heavy_check_mark:

ssokolow commented 2 years ago

the shortcut_name as it is specified now is an id and should remain stable

That was actually one thing that confused me. Perhaps it should be renamed shortcut_id to emphasize that it's intended as a stable identifying token that should be treated as opaque outside of debugging tools and error messages?

"Name" tends to imply something intended to be human visible... which in turn implies localization support in a responsible codebase.

I think I'm starting to see where the miscommunication went. As it is, presented right now in the PR, the app needs to request each Global Shortcut 1 by 1 and it's during the registering that the user will be entering which shortcut it is in system's terms, be it through UI presented by the portal interface or because the system listens to what the user does and then registers the action accordingly.

As it is, an application would need to unregister older now unused shortcuts, which I agree is unlikely to happen and error-prone.

It could make sense to have the app tell XDP all the shortcuts it's going to be using at CreateSession then only operate under this set of actions, discarding those that were not mentioned when the session was created.

What seems most ideal to me is to allow an application to register a big batch of shortcuts, with suggested default values, so the desktop can display a single "Does this look right? Do you want to override any of the suggestions?" dialog, but also allow the application to submit further requests later. For example, if you enable a plugin that has its own set of hotkeys. (Otherwise, you risk forcing "Please restart to apply changes" on users.)

flatpak / xdg-desktop-portal