cornerman / i3-easyfocus

Focus and select windows in i3
GNU General Public License v3.0
172 stars 12 forks source link

Easy button pressing on screen? #24

Open crypdick opened 5 years ago

crypdick commented 5 years ago

Thanks for the great script!

Question: How hard it would be to extend i3-easyfocus so that it can also show all the click-able GUI elements on the current screen?

I use vimium to eliminate mouse use in my browser. It displays little letters over each button, link, and click-able element to quickly focus or click on. It's a life-saver for my carpel tunnel. I'd love to have the same functionality in i3!

cornerman commented 5 years ago

Nice idea, that would be an awesome feature.

But afaik it is not that easy. If I understand correctly, than for you GUI elements would be buttons inside the windows on the screen. This is a hard problem, because windows can be created with QT or GTK or something else and there is no common protocol to get the elements from the window. I do not even know whether youi could iterate the elements of, e.g., a GTK window alone.

If there is a solution to this, then we could probably built something for it, but that would not even be i3 specific, it should then really work for any window manager.

crypdick commented 5 years ago

Great, makes sense. Thanks for your insight.

snippins commented 5 years ago

@crypdick @cornerman I know this is closed I found an solution for this. There is an automation project called dogtail that allows programmatically clicking buttons/ text-boxes.

For dogtail to work correctly atspi must be enabled. Usually this is on by default for GTK apps, for qt5 app you can enable it for an app by running it with "QT_LINUX_ACCESSIBILITY_ALWAYS_ON=1 qt_app_name".

There is a gui tool called sniff that come with dogtail allowing inspecting all clickable component of an window/app. Thus, it is able to list all the clickable component of an window/app. And more importantly, we can also programatically get the positions of the clickable components such as buttons and text-boxes. (dogtail.tree.Node.position).

So there is indeed a way to do this, at least for gtk and gt apps.

cornerman commented 5 years ago

@snippins That sounds great! I was not aware of it.

I am going to reopen this issue, because I think that would be a great feature and we have code to show labels and act on key presses. I do not know yet whether it should live in this repository or be its own project. Please let me know what you think.

If we want this to live in this repository, there are a few things to consider. Currently this project is only targeted for the window manager i3, but we have an issue to work for other window managers as well (#3). We might as well think about renaming this project to not contain the word i3 then. We probably want this feature to live in a separate executable (gtk-easyfocus or qt-easyfocus or just gui-easyfocus). At least that is what I am thinking right now.

Help is definitely very welcome, because I do not know whether I will have time to work on this.

snippins commented 5 years ago

We don't need to worry about specific GUI toolkit with dogtail so something like gui-easyfocus is probably good enough. Dogtail is in python so it should not be too hard I hope. Give me a week I will try to cook up a python script that spits out coordinates of buttons/text boxes/tabs of a window.

crypdick commented 5 years ago

@cornerman I use i3 full-time, so I don't if it's agnostic ^_^

@snippins any progress?

crypdick commented 4 years ago

I've been trying to get dogtail examples working for a while now. Can't get around GTK errors.

snippins commented 4 years ago

@crypdick I use plenty of QT apps and enabling accessibility features for them unfortunately lead to memory leaks, so I didn't continue. The situation might change now though

snippins commented 4 years ago

@cornerman @crypdick

Seems like my qt apps no longer getting memory leaks from enabling atspi, so I decided to continue this.

Here is the script that spits out windows control and their positions for all applications, it is pretty simple: I don't know how to detect the active app yet.

#!/usr/bin/env python3

import pyatspi
desktop = pyatspi.Registry.getDesktop(0)

def traverse(root):
    if len(root) == 0:
        try:
            x, y = root.queryComponent().getPosition(pyatspi.DESKTOP_COORDS)
            if x >=0 and y >=0:
                print("---" + str(root.name) + " " + str(root.role) + " " + str(x) + " " + str(y))
        except:
            pass
    for child in root:
        traverse(child)

for application in desktop:
    print(application.name)
    try:
        traverse(application)
        # find_push_buttons(application)
    except:
        pass
TheJoeSchr commented 1 year ago

@snippins That sounds great! I was not aware of it.

I am going to reopen this issue, because I think that would be a great feature and we have code to show labels and act on key presses. I do not know yet whether it should live in this repository or be its own project. Please let me know what you think.

If we want this to live in this repository, there are a few things to consider. Currently this project is only targeted for the window manager i3, but we have an issue to work for other window managers as well (#3). We might as well think about renaming this project to not contain the word i3 then. We probably want this feature to live in a separate executable (gtk-easyfocus or qt-easyfocus or just gui-easyfocus). At least that is what I am thinking right now.

Help is definitely very welcome, because I do not know whether I will have time to work on this.

Did anything happen with porting this to gui-easyfocus or some such?

phil294 commented 1 year ago

I did this a few months ago in a separate project called Vimium Everywhere. It works on any kind of X11 Linux system for any kind of application: Gtk, Qt, Electron/Chrome, Firefox, LibreOffice, Java, With Wayland support probably coming soon. It's an AutoHotkey-for-Linux script and the logic for finding and clicking controls is the commands WinGet, ControlGetPos and ControlClick. The implementation details of this stuff in AtSpi is a bit complicated because there are many bugs in common applications supporting AtSpi and it doesn't even have a concept of X11 windows so you need to match by window title, for example. Also, it's necessary for the end user to set flags for many applications individually though, which is listed on the vimium-everywhere Readme. That's not really avoidable and/or automatable.

In the hopes it may be helpful to you guys, the accessibility code is here: https://github.com/phil294/AHK_X11/blob/master/src/run/display/at-spi.cr

Another reference implementation would be Orca, the a11y helper / screen reader for Linux. It's source is in Python also but it is huge.

Anyway, to just get a feeling for control names, positioning, framework support etc. I'd suggest giving Accerciser a try, or alternatively AHK_X11's Window Spy feature.

There's also of course no issue with using i3-easyfocus and vimium-everywhere parallelly, but keep in mind the latter is still potentially buggy :)