Hammerspoon / hammerspoon

Staggeringly powerful macOS desktop automation with Lua
http://www.hammerspoon.org
MIT License
12.02k stars 584 forks source link

window/filter.lua throws "attempt to index a nil value" when trying to start a global watcher #2465

Closed sprak3000 closed 3 years ago

sprak3000 commented 4 years ago

I am writing a spoon that iterates over all the windows on your screens and re-positions them. This is the code loop that spawns the error:

    app:activate()
    local windows = hs.window.filter.new(appTitle):getWindows()

    for k, v in pairs(windows) do
        obj.logger.d('Positioning window ' .. v:id() .. ' of app ' .. appTitle)
        v:moveToScreen(screen)
        v:setFrame(frame, 0)
    end

Randomly, the attempt dies out leaving this error from the Hammerspoon console:

2020-09-17 21:39:39: ********
2020-09-17 21:39:39: 21:39:39 ERROR:   LuaSkin: hs.menubar:setClickCallback() callback error: ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1561: attempt to index a nil value (field 'watcher')
stack traceback:
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1561: in upvalue 'startGlobalWatcher'
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1722: in upvalue 'start'
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1838: in method 'getWindows'
    ...s/luis/.hammerspoon/Spoons/ArrangeDesktop.spoon/init.lua:38: in function 'ArrangeDesktop._positionApp'
    ...s/luis/.hammerspoon/Spoons/ArrangeDesktop.spoon/init.lua:58: in function 'ArrangeDesktop.arrange'
    ...s/luis/.hammerspoon/Spoons/ArrangeDesktop.spoon/init.lua:83: in function <...s/luis/.hammerspoon/Spoons/ArrangeDesktop.spoon/init.lua:83>
2020-09-17 21:39:39: ********

The line throwing this error is global.watcher:start() with global.watcher being set a few lines above by global.watcher = appwatcher.new(appEvent). I tried going into the appwatcher.new code to see what happens there and why it might be returning nil at times. However, I'm not as familiar with the code in watcher.m to make a reasonable guess.

It seems like wrapping the global.watcher:start() in a guard clause and only running it if global.watcher is not nil would solve my issue, but I'm not sure what, if any, side effects that would have. Again, this seems to happen randomly. It could happen after any number of windows being positioned or not at all. If there is any other information I could provide, happy to help diagnose this.

al3xandru commented 4 years ago

I'm getting this error with 0.9.79 and my config is not loading. I had to downgrade to 0.9.78.

cmsj commented 4 years ago

This is very weird, hs.application.watcher.new() should never return nil. Happening randomly after some amount of time makes me immediately think of garbage collection. @sprak3000 could you load up your config, then run collectgarbage() a few times in the Hammerspoon Console and see if that causes this to occur sooner?

sprak3000 commented 4 years ago

@cmsj

Ran it a few tests, but it still seems random though possibly not as random. Did the garbage collection, and it seems to stop between the same two apps now -- call them five and six. Not sure if that helps.

szymonkaliski commented 4 years ago

@sprak3000 updating to 0.9.80 fixed that for me, if you're relying on "window closed"-like events from window.filter there's a PR for that: https://github.com/Hammerspoon/hammerspoon/pull/2485

sprak3000 commented 4 years ago

@szymonkaliski

I should have mentioned that in my previous comment. I upgraded to 0.9.8 prior to running the garbage collection tests. Still get the same error. I'm not relying on any "window closed" events or any events to my knowledge from window.filter. The code in my original report is all I am doing. I use window.filter to get the windows on a screen and then put them in a different spot -- different screen, window size, etc.

asmagill commented 4 years ago

If I'm understanding hs.window.filter (and there is no reason to believe that I am!), then the only way global.watcher could be nil at line 1561 is some invocation of line 1559 (startAppWatcher(app,app:name())) is, however many layers deep, calling stopGlobalWatcher()... which appears unlikely, but I can't say is entirely impossible -- it's hard to track all of the permutations of what internal functions might be invoked without something to test against...

And if I'm right, it means there is some subtle difference in what the version of hs.uielement in 0.9.78 is reporting on and what it's reporting on in 0.9.79+... some window types that one ignores and the other doesn't or something like that since hs.window.filter didn't change (except for how the pid is determined).

@sprak3000 does there seem to be any specific applications that causes the error more often than others? Or any application that never seems to cause the error? Is ArrangeDesktop.spoon online somewhere so I can try it out on my machine and see if I can't trace the specifics more closely?

sprak3000 commented 4 years ago

@asmagill

I wish I could say there was a distinct pattern, but it doesn't seem to play favorites. Slack, various JetBrains IDEs, iTerm2, Postman, Firefox... Doesn't seem to discriminate; it just suddenly throws that error. Can't swear to any that never throws the error... Maybe Activity Monitor and the Apple Calendar apps.

I don't currently have it anywhere on-line. Been working on it locally, but I'll get it somewhere for you to grab and test. Stay tuned...

sprak3000 commented 4 years ago

@asmagill

I've pushed up a copy of my init.lua and the spoon init.lua for you to check out. If you have any questions, let me know.

Briefly, the idea is in one's init.lua to define your desktop arrangement configurations (the spoon has a function to help build that for you), pass them into the spoon, and then have the spoon be able to arrange things based on those configurations. Both my Macbook Only and Promoboxx configurations experience the issue, so it does not appear to be a question of one versus many monitors.

asmagill commented 4 years ago

Ok, I can confirm that #2485 does not fix this, so we still have something unexpected going on. I'll keep digging and let you know what I find.

asmagill commented 4 years ago

Hmmm... this is going to be a challenge...

@szymonkaliski @cmsj @latenitefilms If I'm right, this may affect how some things we have and will code going forward need to be handled or possibly even re-written... Sorry, this is long, but I want to put my entire line of thought and observations out there so you can tell me if I'm crazy or not.

In Section 2.5.3 of the Lua docs, there is a line I want to point out:

... The execution of each finalizer may occur at any point during the execution of the regular code.

I checked, it's also written that way in the Lua 5.3 docs, but I never observed that actually happening... it always seemed as if the finalizers were invoked either before or after some block of code was executed via lua_pcall(...), not in the middle of it.

I think I've observed it in 5.4... Using @sprak3000's code, and a modified version of filter.lua (the mods were to print a line upon entering and exiting both stopGlobalWatcher and startGlobalWatcher -- with stopGlobalWatcher a debug traceback was also printed upon entering) I get the following if I trigger multiple rearrangements fairly close to each other:

> a = dofile("_scratch/arrangeDesktop.lua")
-- Loading Spoon: ArrangeDesktop

-- *** This is the first trigger ***

-- Loading extension: window
-- Loading extensions: window.filter
In startGlobalWatcher
Leaving startGlobalWatcher  userdata
In stopGlobalWatcher
stack traceback:
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1573: in upvalue 'stopGlobalWatcher'
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:2074: in method 'pause'
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1851: in method 'getWindows'
    ...config/hammerspoon/_Spoons/ArrangeDesktop.spoon/init.lua:38: in function 'ArrangeDesktop._positionApp'
    ...config/hammerspoon/_Spoons/ArrangeDesktop.spoon/init.lua:58: in function 'ArrangeDesktop.arrange'
    ...config/hammerspoon/_Spoons/ArrangeDesktop.spoon/init.lua:83: in function <...config/hammerspoon/_Spoons/ArrangeDesktop.spoon/init.lua:83>
Leaving stopGlobalWatcher   nil

-- *** This is the second trigger ***

... cut for brevity, it's the same as above

-- *** This is the third trigger ***

... cut for brevity, it's the same as above

-- *** This is the fourth trigger ***

In startGlobalWatcher
In stopGlobalWatcher
stack traceback:
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1573: in upvalue 'stopGlobalWatcher'
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:2082: in function <...n.app/Contents/Resources/extensions/hs/window/filter.lua:2078>
    [C]: in metamethod '__gc'
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1015: in field 'new'
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1024: in field 'created'
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1393: in upvalue 'appWindowEvent'
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1202: in method 'getFocused'
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1226: in method 'getAppWindows'
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1219: in field 'new'
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1430: in upvalue 'startAppWatcher'
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1558: in upvalue 'startGlobalWatcher'
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1732: in upvalue 'start'
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1848: in method 'getWindows'
    ...config/hammerspoon/_Spoons/ArrangeDesktop.spoon/init.lua:38: in function 'ArrangeDesktop._positionApp'
    ...config/hammerspoon/_Spoons/ArrangeDesktop.spoon/init.lua:58: in function 'ArrangeDesktop.arrange'
    ...config/hammerspoon/_Spoons/ArrangeDesktop.spoon/init.lua:83: in function <...config/hammerspoon/_Spoons/ArrangeDesktop.spoon/init.lua:83>
Leaving stopGlobalWatcher   nil
WTF! how can global.watcher be nil here? -- oh yeah, forgot I stuck that in there, right before `startGlobalWatcher` calls `global.watcher:start()`
22:17:30 ERROR:   LuaSkin: hs.menubar:setClickCallback() callback error: ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1563: attempt to index a nil value (field 'watcher')
stack traceback:
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1563: in upvalue 'startGlobalWatcher'
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1732: in upvalue 'start'
    ...n.app/Contents/Resources/extensions/hs/window/filter.lua:1848: in method 'getWindows'
    ...config/hammerspoon/_Spoons/ArrangeDesktop.spoon/init.lua:38: in function 'ArrangeDesktop._positionApp'
    ...config/hammerspoon/_Spoons/ArrangeDesktop.spoon/init.lua:58: in function 'ArrangeDesktop.arrange'
    ...config/hammerspoon/_Spoons/ArrangeDesktop.spoon/init.lua:83: in function <...config/hammerspoon/_Spoons/ArrangeDesktop.spoon/init.lua:83>

Note the line in the fourth trigger's callback [C]: in metamethod '__gc' in a traceback that should only contain functions from ArrangeDesktop and filter.lua. That was the second (wait a bit for the first) thing that caught my eye.

In an earlier version of added lines to filter.lua I had it print out each app that was visited in startGlobalWatcher's loop along with the current value of global.watcher and I observed that it occasionally (right before the crash that started this thread) changed to nil part way through the list. That was the first thing that struck me as odd. (I didn't include this output because 1. I didn't save a copy, 2. it was printing out a long list that basically confirmed what I knew had to be happening... that global.watcher was getting cleared or reset before startGlobalWatcher had finished, and 3. this is already very long)

Somehow, the globalStopWatcher, which is the only function which sets global back to an empty table, was getting invoked before startGlobalWatcher had completed iterating over the apps.

Now, look closely at _positionApp in ArrangeDesktop:

function obj:_positionApp(app, appTitle, screen, frame)
    obj.logger.d('Positioning ' .. appTitle)

    app:activate()
    local windows = hs.window.filter.new(appTitle):getWindows()

    for k, v in pairs(windows) do
        obj.logger.d('Positioning window ' .. v:id() .. ' of app ' .. appTitle)
        v:moveToScreen(screen)
        v:setFrame(frame, 0)
    end
end

Each time it is invoked, it creates a new window filter. And it's invoked once for each application in the arrangements list every time the the re-arranger is triggered. Possibly not the most efficient way, from things I have heard suggested about the module, but tbh it's probably how I would have done it myself -- I don't know hs.window.filter very well yet.

So what I think is happening is that garbage collection for a previous watchers occurs at some point while a second invocation of the re-arranger is running, which causes WF:delete() to run... which calls stopGlobalWatcher() which resets global.watcher to an empty table, even though we in the middle of using a totally brand new watcher.

The two fixes I see are as follows:

Add a close metamethod to filters and remove the gc one. Downside is no one can save a filter in a global variable (or even a local variable in a file that's to be used inside local functions within the file) now and they can only be used within a closed scope (e.g. within a single function) and the end user has to know that the proper way to define them is local windows <close> = hs.window.filter.new(...). I don't like this idea.

Add some variable to global in startGlobalWatcher when it starts and clear it when it ends. If this value is set, stopGlobalWatcher becomes a no-op so the __gc of other watchers can't clear it. This only works if I've diagnosed the problem correctly (and I am in no way positive that I have -- hs.window.filter is complex and I started really digging into less than 72 hours ago...)


Thoughts? Am I crazy? Have I missed something obvious or even not-so-obvious?

asmagill commented 4 years ago

@sprak3000 If you don't mind, could you try replacing the hs.window.filter file in your Hammerspoon application with the one at https://github.com/asmagill/hammerspoon/blob/startGlobalWatcherGuard/extensions/window/filter.lua and let me know if it helps with your problem? I don't want to make it a pull request until at least one of the others confirms my reasoning, but I am curious if it will help your situation.

Copy the file contents at the URL above and overwrite the contents of filter.lua in your Hammerspoon application with it... it's probably at /Applications/Hammerspoon.app/Contents/Resources/extensions/hs/window/filter.lua if you've installed Hammerspoon in the traditional location.

(the specific line differences, if you'd rather just type in the changes, are at 1552, 1559, and 2071-2072 in the version displayed at the URL above)

Once you've done so, restart or reload Hammerspoon (either should be sufficient) and then try out your code and let me know how it fairs.

cmsj commented 4 years ago

@asmagill wow, excellent analysis!

I've only just woken up, but thinking about both of those potential solutions makes me wonder if they won't both throw up a lot more problems 😕

It sort of sounds like we shouldn't be doing global watchers, but window.filter is the only one that's dynamic enough to trigger this situation reliably?

cmsj commented 4 years ago

@asmagill a thought I had during breakfast this morning - I'd assumed this was fallout from my hs.{application,window,uielement} rewrite, but if it's all down to Lua GC then I wonder if this is a behaviour change in Lua 5.4?

latenitefilms commented 4 years ago

@cmsj - I wonder if changing to legacy incremental garbage collection could change the behaviour of things?

This might also offer some clues: Incompatibilities with the Previous Version

latenitefilms commented 4 years ago

Does this mean anything to you?

When finalizing an object, Lua does not ignore __gc metamethods that are not functions. Any value will be called, if present. (Non-callable values will generate a warning, like any other error when calling a finalizer.)

sprak3000 commented 4 years ago

@asmagill

I copied over your filter.lua file and ran the desktop arrangement a number of times now. Not seeing that error crop up, but I am getting this repeatedly now:

2020-09-23 08:42:11: 08:42:11 ERROR: LuaSkin: Unable to fetch element attribute NSAccessibilityFrontmostAttribute for: Firefox Developer EditionCP Socket Process

Does not seem to impact the functionality of my spoon. All the windows appear to be iterated over and moved to their appropriate screen / location including any Firefox windows.

sprak3000 commented 4 years ago

@asmagill

Follow-up... I forgot about adding hs.window.filter.ignoreAlways['Firefox Developer EditionCP Socket Process'] = true to my init.lua. That clears up the error I mentioned above, so the fix appears to be functioning well for at least my use case.

cmsj commented 4 years ago

@asmagill if for some reason your potential fix here doesn't work, I was just re-reading the Lua docs around GC and we do technically have the ability to pause/resume GC as a weapon of last resort.

megalithic commented 4 years ago

@asmagill your filter.lua module, also, fixed the issues for me, too. so far, so good. thanks!

asmagill commented 4 years ago

@sprak3000 Those "errors" are actually just log messages about an accessibility item that background applications traditionally don't have -- they can be safely ignored and are actually removed by #2505 which should be in the next release.

asmagill commented 4 years ago

@cmsj, it's not the fact of global watchers per se, it's more when individual instances that refer to the global watcher can absolutely stop or destroy the global watcher without having any way to check if it other instances might be active or using it that seems to be the issue, if Im understanding what's going on.

I should probably review hs.location soon as well, since I know it uses a global instance that then passes callbacks into other "instance" objects. I think my approach is probably safe, but it's been a while, so I'm not positive.

And it will mean rethinking hs._asm.btle which I was hoping to revive when I attempt to address #2323, hopefully next month.

I should also expand my comments re using __close... we can save an object with a __close metamethod in globals or as up-value locals within a file, but doing so means bypassing invocation of the __close metamethod for that instance and relying on __gc, which as we've seen we have less control over. __close will only be invoked in a closed block and if the local variable holding the object is defined with the <close> keyword... even then it will also invoke __gc eventually, so if we do decide to leverage __close in the future, we need to make sure __gc can tell if __close has already been invoked or not.

And I would be reluctant to change the garbage collection settings just because some module's code prefers it one way or another... we never know if the user has made their own changes to garbage collection timings (or now approach)... I've been known to try when I know I'm using a huge block of temporary data or when I want to flush things out quickly or delay collection indefinitely to stress test something... Our modules really should work independent of the garbage collection settings.

And as I've seen many times on the lua mailing list (and not always very nicely, I might add), "the documentation is the final word on what you should rely on within your own code" and "just because a previous version didn't enforce what the documentation said, doesn't mean a future version won't -- program as described in the docs, not as described by your observations"... and as I mentioned above, the particular line in the documentation that I think applies here is unchanged between 5.3 and 5.4.

asmagill commented 3 years ago

This has already been addressed and merged. Please reopen if you think I've missed something.