YaLTeR / niri

A scrollable-tiling Wayland compositor.
https://matrix.to/#/#niri:matrix.org
GNU General Public License v3.0
4.12k stars 123 forks source link

Gesture bindings #372

Open YaLTeR opened 6 months ago

YaLTeR commented 6 months ago

Add some way to customize gesture bindings. I don't have a concrete design yet, but:

  1. Users should be able to customize how many fingers to use for the swipe gestures (supersede https://github.com/YaLTeR/niri/pull/315, cc @MagneFire)
  2. I should be able to add new gestures without breaking users' configs
  3. Users should be able to bind gestures only where it makes sense, i.e. swipe actions to swipe gestures and not to pinch gestures for example

I'm thinking something like a binds section, but for gestures. To satisfy 1. and 2., maybe encode number of fingers explicitly into the "key"?

gestures {
    touchpad-swipe-3-horizontal horizontal-view-movement
    touchpad-swipe-3-vertical workspace-switch
    Mod+Mouse3-horizontal horizontal-view-movement
    Mod+touch-swipe-3-horizontal horizontal-view-movement
}

This way, I can add new defaults when this section is missing from the config, and when it is present in the config the user will just need to add new gestures manually.

I don't entirely like this though, looks kinda awkward.

Also, I can see a problem in the future where there may be a 2D gesture, and so you will need to be able to bind either touchpad-swipe-N to a 2D gesture, or separate touchpad-swipe-N-horizontal/vertical to 1D gestures. But also maybe that's not a problem and can just be verified during parsing.

Also, this "gestures" section that I have in mind seems to be mainly about continuous gestures (swipe and pinch) and not about discrete gestures like double-resize-click (these seem more fit for the regular binds section).

Also, should it be allowed to bind "vertical" to "horizontal" gestures and vice versa? Maybe not.

YaLTeR commented 4 months ago

How about this:

gestures {
    touchpad-swipe-3 horizontal="view-movement" vertical="workspace-switch"
    Mod+Mouse3 horizontal="view-movement" vertical="workspace-switch"
    Mod+Mouse2 "resize-window"
}

You can only set either an argument, or the properties. All three have only those variants that make sense for that property (i.e. no vertical-only gestures in the horizontal property).

I'm still not sure about encoding the finger count into the "key" name.

in the future where there may be a 2D gesture

Examples: currently existing interactive window resize; future interactive window move. For the resize it even makes sense to be able to bind just the horizontal part (since we're a scrolling WM).

markomarkovic commented 4 months ago

I was looking for a way to set Mod+Mouse1 for horizontal view movement, this would solve that issue nicely.

The encoding of the finger count into the key name makes sense to me.

IvanTurgenev commented 2 months ago

Yeah even mouses nowadays have gesture buttons image

nakibrayan3 commented 2 months ago

how about some thing like this:

gestures {
  touchpad-swipe-3-horizontal-left focus-column-left
  touchpad-swipe-3-horizontal-left focus-column-right
}

So that the touchpad gestures are identical to the keyboard shortcuts for focusing columns. this fixes https://github.com/YaLTeR/niri/discussions/466

YaLTeR commented 2 months ago

I envision the gesture bindings section only for continuous binds, and the regular bind section for discrete binds.

valpackett commented 1 month ago

Do you have any thoughts on forwarding continuous gestures to layer-shell based desktop components? (think being able to pinch in the app menu from anywhere like on macOS) I have previously prototyped that for Wayfire in wf-globalgestures which exposes a custom protocol, but I think I'd be fine with a config-based solution to avoid having Yet Another Protocol, like so (this example would result in that gesture always being forwarded to a client with matching 'namespace' on the current output):

gestures {
    touchpad-pinch-4 forward-to-client namespace=r#"^owo\.uwu\.ControlCenter$"#
}
YaLTeR commented 1 month ago

Hm, interesting idea. Need to look into it when implementing.

sodiboo commented 1 week ago

Obviously, you should be able to bind a swipe of however many fingers, or a different bind when swiping with a modifier.

But i'd like to be able to swipe with just one finger to scroll in the workspace view, however i obviously don't want to intercept a one-finger swipe from clients. So it should only trigger if the swipe would not hit a client. In particular, this means that the event would not be sent to any surface because the surfaces it did hit (a list which may have a length of zero) all have an input region that doesn't include the location of the event. If that sounds completely foreign to a reader unfamiliar with Wayland, consider that swaybg is "just" a layer shell surface at the lowest layer. Swiping outside the layout means swiping on swaybg, but it has an empty input region, so it never receives input events, and that input event "falls through" the stack of surfaces until it hits the bottom layer. This bottom layer is owned by niri and can always receive all inputs. Let's call it the "dead region"; which represents any region which isn't included in any surface's input region.

I also would want to be able to swipe from left/right screen edges, and cause that to scroll in the workspace rather than being sent to the client in my struts. This specific bind can be constructed by never sending touch inputs to toplevels in the struts (which should probably not be the default behaviour; but one i want to be able to configure regardless because simply picking up my tablet will send touch inputs to strut clients, as it doesn't have bezels thick enough to avoid accidental edge touch inputs), but i think in general it is useful to consider "swipe from [left/top/right/bottom] edge" a separate gesture, so i can for instance bind swipe-from-top-edge { workspace-switch-up; }, which is distinct from just vertical-swipe-on-dead-region { workspace-switch; }, because the first of these should not trigger when i swipe in the gap between windows, but the second one should.


Because of the above consideration, i'd say that the type of gestures that exist for touchscreen is generally a different shape from the ones that exist for touchpad, both of which are objectively just very different from mouse gestures. I think it makes a lot of sense to separate them into sub-sections, which makes them more visually distinct than the harder-to-parse longer names that repeat what kind of input method this gesture belongs to.

mouse gestures should, for instance, be able to trigger discrete binds. (Mod+Shift+LMB { close-window; } and simultaneously Mod+Left { interactive-move; }). you might argue that this belongs in the binds section, but i counter-argue that the binds section should instead be called gestures { keyboard {. like, just within the gestures section, that totally makes sense. it's what happens when you do certain inputs, distinguished by input device form factor (e.g. "what kind of inputs are even available"). this allows "gesture binding modes" to be merged with "key binding modes". but perhaps we shouldn't distinguish these. it also makes no sense to call the overall section gestures { if it contains a keyboard { subsection. perhaps it should then be called actions or just simply binds. if the new shape of the unified compositor binds interface is decided before binding modes is fully implemented, then we could call this binds without any complicated migration code. Just make binds "name-of-mode" { be the name of the section with binding modes, and binds { be an alias for binds "default" { keyboard {. We would not need to finalize what continuous gesture bindings should look like in order to agree on using this schema for binds in the future. Just that they should be separated by device.


There should also be a way to have different gesture binds (particularly touch binds) when a window is fullscreen. Because then i can't scroll in my struts. There are no struts. I could for instance have a swipe from the bottom edge un-fullscreen this window. To be real sexy, that should be a smooth animation that transitions back to a maximized one-window column. And when there is no fullscreened window, i should be able to use the same gesture to initiate a workspace switch (but it ought to have a different threshold before it actually triggers a full switch on release)

perhaps if we support arbitrary named gesture bind modes, this can be implemented using those. each mode can have additional transitions in its own section, next to the different input devices.


gestures "default" {
    transitions {
        on-fullscreen "fullscreen"
    }

    touch {
        swipe-from-top-edge { workspace-switch-up; }
        swipe-from-bottom-edge { workspace-switch-down; }
        horizontal-swipe-on-dead-region { horizontal-view-movement; }
    }
}

gestures "fullscreen" {
    transitions {
        unfullscreen "default"
    }

    touch {
        swipe-from-bottom-edge { unfullscreen; }
    }
}

the transitions block feels somewhat out of place here, but i'm not sure how better to do this without binding modes, and i'm not sure how better to implement the binding mode switch here. it could be in a separate events block of the config and trigger scripts (heck, at that point just make it entirely event-stream-reliant), but this is annoying if i have several "base modes" with a fullscreen variant each, because then my script needs to switch on the currently selected mode.


A lot of gestures like workspace-switch, unfullscreen, close-window, interactive-move represent a discrete action, but require continuous action. After a certain movement threshold, or if there is enough inertia when released, releasing will cause the desired action to trigger. But, until it has been released, you can always cancel them. Perhaps it's worth allowing a separate discrete action to occur when this gesture actually executes? This obviously doesn't apply to all gestures, for instance horizontal-view-movement represents an entirely continuous action, from a continuous input. There is no "threshold" for which this triggers (except maybe a dead zone before a given touch motion is even classified as a gesture to begin with). So a "completion action" would need to be implemented on a gesture-by-gesture basis.

The threshold completion action cannot sufficiently implement switching to a "fullscreen gestures mode". Because i could still want to fullscreen from a niri msg client, or a keyboard, so i can't rely on switch-binding-mode "fullscreen" as the completion action for my unfullscreen gesture.


Oh yeah, by the way

I can see a problem in the future where there may be a 2D gesture

That 2D gesture exists now. It's called "interactive move".


There are many different kinds of gesture actions mainly distinguished by axis (vertical workspace switch vs horizontal workspace movement; even interactive move which occurs on both axes). pinch actions like close-window don't quite fit into that model, but you could think of it as a separate "axis" of a sorta resize factor which is one-dimensional; though in general it also has a 2D movement factor. so i guess pinch is a three-dimensional gesture? It's definitely not two-dimensional though. You might also ask "wait what, how is close-window a three dimensional action?". It's actually complicated. Because in my mind, it should overlap with interactive-move. Open a picture-in-picture video player on your phone, and try to pinch and drag it with two fingers. This is what i'm talking about. The behaviour on Android is difficult to describe in words; i'm like 90-ish-% sure that iOS PiP works the same way i'm thinking of. Whatever this action is, how you wish to describe it, it is clearly not interchangeable with horizontal-view-movement

For the fairly limited amount of gestures and gesture actions, it might be worth going over each pair one-by-one to see if they make sense. Implement a way for that action to map onto this gesture. A specific action may actually be implemented in multiple different ways, from completely different gestures see pinch { close-window; } vs swipe-up { close-window; }. These are the same semantic action; both have a particular threshold at which they will "execute" upon release. But they clearly require completely different implementations.

After planning out which gestures can trigger which actions, and also how exactly that action will look when triggered by this gesture, it may be easier to spot patterns and which properties are actually useful to look at.


For touchscreen binds, i think they could further be split by subsections for how many fingers you want. This is not just because it allows you to not repeat it, but also because this affects which gestures are available.

touch {
  fingers 1 {
      horizontal-swipe-on-dead-region { horizontal-view-movement; }
      // This makes no sense, you can't pinch with one finger
      /- pinch { ...; }
  }
  fingers 2 {
    // Two-finger pinch has a special property:
    // it makes sense to have a "vertical pinch" to scale a vertical axis
    // likewise, a "horizontal pinch" scales a horizontal axis.
    // This is most evident in a Cartesian coordinate system, where you may want to scale the axes non-proportionally
    // because the graphed function has really large Y-values for really small X-values, or vice versa.
    // Notice that the "scaling factor" of the pinch gesture is not proportional to the distance between the fingers,
    // it's rather proportional to just one of the components of that distance.
    // The axis is decided by the slope of the line that connects the initial finger placements.
    // An axial pinch cannot change axis mid-gesture.
    horizontal-pinch { resize-column-width; }
    vertical-pinch { resize-window-height; }

    // There is also a non-axial pinch. It's not allowed to be used in conjunction with horizontal and vertical pinch.
    /- pinch { close-window; }
  }
  fingers 3 {
      // Fingers are no longer points on a line.
      // They now form a triangle, which intuitively represents a circumcircle with a midpoint and a radius.
      // Though, a circumcircle is the wrong tool here, because colinear fingers cause an infinite radius, which is undesirable.
      // Clearly some other heuristic must be used to get a more stable midpoint,
      // and we should just use the average radius from this midpoint as the "scaling factor" of the pinch.
      pinch { close-window; }
  }
  fingers 5 {
      // At higher finger counts, there is no way to precisely assign a circumcircle, so that breaks down entirely.
      // Though, as i explained, using a circumcircle already leads to surprising results at 3 fingers.
      // But at higher counts, it's not even possible to use a circumcircle,
      // so a heuristic midpoint is not just more correct, it's *necessary* to work at all.
      pinch { close-window; }
  }
}

A lot of actions are meaningless if the gesture didn't occur on a specific object. Because the niri layout in dense, it is hard to trigger a pinch on the dead region unless you have no windows open. But it is nonetheless possible. In some cases where we require a column or window, i think it would make sense to just default to the active window/column for that output, but in others it would be better to invalidate the gesture as a trigger. For some, it doesn't matter if it was triggered on a window or not. We should consider making actions like swipe-on-window distinct from e.g. swipe-in-struts.


A lot of actions that are like "swipe in the bottom strut", "swipe in window gaps", "swipe on the dead region when no window is open" can also be implemented by a client on the background layer. Consider not implementing the distinction for "Gesture X on region Y", and offloading it to a client that makes use of niri_ipc. How do we expose continuous binds to IPC such that this makes sense? How do we ensure the compositor doesn't suddenly think a compositor-handled gesture should take priority over an in-progress specific gesture from such an "Advanced Gesture Client"? Is the idea maybe meaningless, because there are practically no continuous actions that are easy to implement if the compositor doesn't own the inputs that caused them? Consider not offloading this to clients, and making sure the compositor can deal with all the gestures people actually care about.


To whoever reads comment this while implementing code to validate the config we decided upon: I don't envy you in this moment.

YaLTeR commented 1 week ago

So it should only trigger if the swipe would not hit a client.

Do all background layer-shell clients have an empty input region? I'm not sure it's an easy case to tell apart from having input region but ignoring all events.

Also consider that with CSD clients have some input region outside their geometry to give a bit of area to resize handles.

i think in general it is useful to consider "swipe from [left/top/right/bottom] edge" a separate gesture

Yeah, edge swipes are usually a separate gesture. Though, with no built-in libinput support as far as I can tell.

you might argue that this belongs in the binds section, but i counter-argue that the binds section should instead be called gestures { keyboard {.

Well... yeah, but not really because "binds" is a more common name for this, plus we already have it, and already with some discrete mouse and touchpad binds there (scrolling).

There should also be a way to have different gesture binds (particularly touch binds) when a window is fullscreen. Because then i can't scroll in my struts.

Then maybe strut gestures are not the right thing entirely, and instead what you want is edge swipes in all cases?

And when there is no fullscreened window, i should be able to use the same gesture to initiate a workspace switch

Idk, need to think about it. For example, we could have top edge swipe to unfullscreen and bottom edge swipe to go into the overview, where you will be able to scroll workspaces up and down.

gesture bind modes

Let's keep things simple please, at least until there's some very compelling reason to complicate them.

A lot of gestures like workspace-switch, unfullscreen, close-window, interactive-move represent a discrete action, but require continuous action.

Ideally all gestures have continuous movement indication, even if the outcome is discrete (think how the current interactive move rubberbands the window out of the layout, even though the result is a discrete "window ended up on the mouse cursor vs. left in the layout"). Discrete actions with no obvious movement don't really belong to gestures honestly? Like, touch swipe to switch tty feels kinda wrong. They can live as regular binds.

That 2D gesture exists now. It's called "interactive move".

Yeah, also interactive resize when it's in two directions at once.

For touchscreen binds, i think they could further be split by subsections for how many fingers you want.

Also need to see what API libinput offers here (if any) and what can be done easily. I'm afraid we don't have an entire team here to implement arbitrary finger arbitrary gestures, etc.

In some cases where we require a column or window, i think it would make sense to just default to the active window/column for that output

I'd either ignore, or pick the closest target in this case. See how interactive move picks the closest target drop location.

Consider not offloading this to clients, and making sure the compositor can deal with all the gestures people actually care about.

Yes, this unnecessary complexity and IPC lag is exactly why you don't want gestures and stuff to live in separate clients.