Closed asmagill closed 4 years ago
Ok, not quite "don't work at all"...
if the code which creates and uses the coroutine is in one block
, and doesn't use anything which involves LuaSkin
(i.e. it's pure lua, not even a single function we've added that uses checkArgs:
), then it will work:
This works (when pasted into the console as one "line" of input):
a = ""
cothread2 = coroutine.create(function()
for n = 1, 5 do
a = a .. "honk "..n.."\n"
coroutine.yield()
end
end)
while coroutine.status(cothread2) ~= "dead" do coroutine.resume(cothread2) end
print(a)
This crashes (when pasted into the console as one "line" of input) -- yeah, it's a stupid example, but hs.canCheckForUpdates
was the first thing I found that only uses LuaSkin for a single [skin checkArgs:LS_TBREAK];
, about the simplest usage I could think of):
a = ""
cothread2 = coroutine.create(function()
for n = 1, 5 do
a = a .. tostring(hs.canCheckForUpdates()) .. "\n"
coroutine.yield()
end
end)
while coroutine.status(cothread2) ~= "dead" do coroutine.resume(cothread2) end
print(a)
edit: made it more clear that blocks should be pasted in as one entry if testing in the console
Obviously few/none of our users currently utilize coroutines, or we'd have heard something by now, but it is a feature lua promotes, so its only a matter of time...
Possible short term band-aids that I can think of:
remove coroutine library from our implementation of lua (either by modifying lua source, or adding coroutine = nil
to hs._coresetup
. Cowardly cop-out, and removes the possibility of the one case that does work -- using them within a single file/function/block with pure lua.
partially implement above fix -- modify functions to use [LuaSkin sharedWithState:L]
and throw a lua error if L
doesn't == skin.L
that says something like LuaSkin incompatible with coroutines
. Pro: should prevent crashes; Con: inconsistent module support within coroutines -- older modules (pre LuaSkin) may partially work, if they don't use LuaSkin to check arguments or manipulate data; newer or updated stuff won't work at all, since we've pretty much standardized on some common patterns that use LuaSkin a lot.
continue to hope no one notices. Cons: obvious.
In any case, if we do end up adding #2304, hs_yield
can't be invoked within a coroutine because we haven't implemented lua_lock
and lua_unlock
(yet); but I think I can easily make this detect such a situation by checking lua_isyieldable()
so this issue isn't a show-stopper (though it does remove the need for one of the tests I was considering)
More thoughts if I think of them...
I've tried to use coroutines in the past, but it just made my head hurt so I gave up. These crashes could explain some of the issues I was running into at the time.
Ok, after some more pondering...
lua_lock
and lua_unlock
actually shouldn't be necessary, if we stay on a single application thread...
We'll still need to make LuaSkin
multi lua_State*
compatible for coroutine support (which is also a necessary step for supporting multiple application threads), but since only a single "lua thread" will ever be active at a given instant in a singe application threaded model, lua_lock/unlock
don't help us.
@cmsj, some ideas I'd like your comments on about how to make LuaSkin support different lua_State
values (these both assume adding sharedWithState:(lua_State *L)
and changing functions in our modules -- but not in delegates or other callbacks -- to use it instead of the shared
contructor):
add (readonly?) property to store initial lua_State
into during creation; when shared
called, set self._L
to this value; when sharedWithState:L
called, set self._L
to the L
passed in. Pros: simple, Cons: may have to call sharedWithState:L
multiple times within a function if it's possible the current value of skin._L
might have changed -- currently (I think) this only affects the proposed hs_yield
, but maybe also others, if we ever write C
functions which utilize/participate in coroutines; won't be sufficient if we ever add true multi application thread support.
create (and store to avoid re-creating?) new LuaSkin
instance for each new lua_State
seen. Pros: muti application thread safe (if/when we decide to go there), as long as no single lua_State
is allowed to run on more than one thread at a time; Cons: more changes to LuaSkin
because registered type helpers, etc. have to be moved to class properties; how do we detect when lua_thread
is garbage collected (so we can remove its LuaSkin instance, if we saved it)? How do we handle a reload if multiple instances still exist (if we can attach a __gc to a lua_thread type, this might be a non-issue, but I don't think we can)
Other possibilities I'm missing?
Re multiple application threads -- If we do end up adding hs.yield
, and maybe even consider attaching it to a debug hook so it runs after every n
lua calls, the extra effort to support multiple application threads becomes less attractive to me. IMHO the times when yielding isn't sufficient might be better served by moving something into Objective-C itself which is a lot faster and most things become easy (well, easier) to multithread and trigger a callback upon completion.
Just my (current) 2 cents... ask me again in a month and I'll probably think something totally different 😜
coroutine.yield
and coroutine.resume
so that they change LuaSkin's internal property for lua_State; Pros: simplest of all requiring no (or very minimal) changes to existing modules; Cons: patching lua source further, which means may have to revisit with 5.4, not sure yet how nested coroutines can be, may need our own stack like structure to remember what to set the property back to. Potentially screwing up something that none of us are that familiar with, so will we really notice subtle changes/errors? Would complicate supporting multiple application threads, if that is something we decide to do later.If I’m reading this right, it seems like we’ve broken coroutines by building LuaSkin (and really everything else) to assume that there is only ever one lua_State object. That being the case, is it actually a good idea to try and remove such a core assumption?
In short, yes... though it's still probably a good idea to keep the main lua thread's state, maybe as a class property, so callbacks/delegates/other places where no lua state is passed in don't have to jump through too many hoops. As long as we make sure that the primary state always runs on the main application thread (even if we decide to allow application threading at a later date), they'll run one at a time and shouldn't pose a problem.
But otherwise, yes, we really should support multiple states everywhere else.
I'm pretty wary of making this kind of change for a Lua feature that's apparently so unpopular that nobody has noticed it's been broken for over 5 years.
For the pure C entrypoints (ie callbacks/delegates), how would we do anything other than fetch the main state object? As-in, how would the C entrypoint even know which state object it should try and use? If it could find the right state object, how would it know whether or not the co-routine that created it, still exists?
Even assuming we could figure that sort of stuff out, this would also mean some fairly significant changes to LuaSkin - not only do we only have one lua_State
object at the moment, we intentionally only have one LuaSkin object no matter how many times [LuaSkin shared]
is called. Would each new co-routine state object need to inherit all of the registered modules/helpers?
I take the point that it's bad we've broken co-routines, but I'm honestly not sure I want to fix them again, if the cost is significantly higher complexity, and a big risk of breaking in all sorts of weird new ways.
FWIW - I guess my question would be WHY do we need coroutines
, and is there a better way to achieve the same outcome with another solution? Most of the things I'd use coroutines
for can be implemented in other ways, such as running things on another thread in Objective-C-land and using a callback.
For now, I'd just make coroutines = nil
and document it somewhere, so it's clear that coroutines
aren't supported in Hammerspoon. If someone else NEEDS them in the future, we can always re-assess.
Hello,
Coroutines are a fundamental control flow construct that ships with the language. Removing them is comparable to removing the function
keyword. I only recently learned that Hammerspoon exists, and I am willing to try to fix the mess that currently prevents it from working with coroutines and with moonjit.
Callbacks should always run on the main lua_State (the lua docs call it a thread, which I've sometimes called a "lua thread" to be pedantic, but it's really primarily a separate isolated lua stack) --the initial lua_State created when the lua engine was initialized -- because they are a new code-block triggered by the external event, not the resuming of something a coder suspended.
That's why I recommend creating the new method sharedWithState:
and applying it only to functions and methods we add, but not within the callback/delegate/event code that handles externally triggered application events. If we go with the first option I outline above, the internal changes to LuaSkin are manageable and there is still only one instance required.
As to why its taken so long to find out... well, most of our users aren't programmers per se, and those that are didn't start with Lua... coroutines are a fundamental part of the language, and the primary answer to the fact that it's by default single threaded. I've meant to learn about them for a long time, but since (1) potentially long running lua, no matter whether its on 1 "lua thread" or 1000 causes Hammerspoon to block, but with #2304 that might no longer be an issue, and (2) I personally can code relatively quickly in Objective-C and can use a background thread it if I want.
If a true lua developer wanted to do something, they'd expect coroutines to exist and wouldn't bother without them.
As I have come to understand the problem, the fix outlined as the first option above won't be difficult, just tedious.
I'm opposed to burying our head in the sand by removing that the library -- we won't be using Lua, then, but a subset of it. Subsets are fine in memory constrained environments, not on a modern desktop.
As a first step, I think adding the following after the NSThread check in the sharedWithDelegate:
method of LuaSkin will at least stop the application crashes, though 90% of our module functions/methods still won't be usable within a coroutine:
int isMainThread = lua_pushthread(self.L) ;
lua_pop(self.L, 1) ; // remove thread we just pushed onto the stack
if (!isMainThread) {
luaL_error(self.L, "LuaSkin shared object incompatible with coroutines") ;
return nil ;
}
I'll test this in a few hours to be certain.
Ok, no, that's stupid... we'd just be checking the saved state, which we already know is the main "thread"... let me think some more...
If the limitation is only that you cannot call Hammerspoon API functions from coroutines, you may be able to "fix" the problem from the Lua side by implementing asymmetric coroutines on top of symmetric coroutines on top of asymmetric coroutines, placing a trampoline inside of your new fake coroutine.resume
, and wrapping Hammerspoon API functions to make sure they are called from the main lua_thread even when the user tries to use them from inside a coroutine.
A sketch of what this would look like, on the library side, which I've called co_wrap.lua over here:
local select = select
local table_insert = table.insert
local table_remove = table.remove
-- utility to save multiple returns or varargs for later use
local NIL = {}
local function drop_one(x, ...)
return ...
end
local save_exprs
save_exprs = function(return_first_x, ...)
if return_first_x == 0 then
local ret = {}
for i=1,select("#", ...) do
local item = select(i, ...)
if item == nil then
item = NIL
end
ret[i] = item
end
return ret
end
return select(1, ...), save_exprs(return_first_x-1, drop_one(...))
end
local unsave_exprs
unsave_exprs = function(t, idx, len)
if not idx then
idx = 0
len = #t
end
if idx == len then
return
end
idx = idx + 1
local ret = t[idx]
if ret == NIL then
ret = nil
end
return ret, unsave_exprs(t, idx, len)
end
local co_stack = {}
local co_running = {}
local n_cos = 0
local real_yield = coroutine.yield
local real_resume = coroutine.resume
local real_status = coroutine.status
local real_create = coroutine.create
function coroutine.resume(co, ...)
-- can't have the same coroutine running twice in our fake stack of coroutines
if co_running[co] then
return false, "cannot resume non-suspended coroutine"
end
-- if we are called from the main thread, resume and trampoline
if n_cos == 0 then
n_cos = n_cos + 1
co_stack[n_cos] = co
co_running[co] = true
local ok, op
local stuff = save_exprs(0, co, ...)
while true do
local prev_co = stuff[1]
ok, op, stuff = save_exprs(2, real_resume(unsave_exprs(stuff)))
if ok and op == "resume" then
-- trampoline and pass stuff to nested co
n_cos = n_cos + 1
co_stack[n_cos] = stuff[1]
co_running[stuff[1]] = true
elseif ok and (op == "yield" or op == "die") then
-- coroutine yielded or died
-- trampoline and pass stuff to parent co
co_running[prev_co] = nil
co_stack[n_cos] = nil
n_cos = n_cos - 1
if n_cos == 0 then
-- parent co is topmost lua thread, so return
return true, unsave_exprs(stuff)
else
table_insert(stuff, 1, true)
table_insert(stuff, 1, co_stack[n_cos])
end
elseif ok and op == "call" then
-- coroutine wants to call some function from the main thread
-- get the results and pass them back to same co
local func = table_remove(stuff, 1)
stuff = save_exprs(0, func(unsave_exprs(stuff)))
table_insert(stuff, 1, co_stack[n_cos])
elseif not ok then
-- coroutine errored
-- trampoline and pass stuff to parent co
co_running[prev_co] = nil
co_stack[n_cos] = nil
n_cos = n_cos - 1
if n_cos == 0 then
-- parent co is topmost lua thread, so return
return false, op, unsave_exprs(stuff)
else
table_insert(stuff, 1, op)
table_insert(stuff, 1, false)
table_insert(stuff, 1, co_stack[n_cos])
end
else
error("Fake coroutine lib: Not sure what happened")
end
end
else
-- tell the loop on the main thread that I'd like to resume pls
return real_yield("resume", co, ...)
end
end
function coroutine.yield(...)
-- tell the loop on the main thread that I'd like to yield pls
return real_yield("yield", ...)
end
function coroutine.status(co)
if co_running[co] then
return "running"
else
return real_status(co)
end
end
function coroutine.create(f)
return real_create(function(...)
-- tell the loop on the main thread that I'd like to die pls
return "die", f(...)
end)
end
function coroutine.call_from_main_thread(f)
return function(...)
if n_cos == 0 then
return f(...)
else
-- tell the loop on the main thread that I'd like to call pls
return real_yield("call", f, ...)
end
end
end
--TODO: also reimplement coroutine.wrap
--coroutine.running should work fine as is.
On the user side:
require"co_wrap"
get_main_thread = coroutine.call_from_main_thread(coroutine.running)
get_my_thread = coroutine.running
a = coroutine.create(function()
local cos = {}
local function print_some_numbers(n)
for i=1,n do print("inner", i, coroutine.yield(i*i)) end
print(get_main_thread())
print(get_my_thread())
end
for i=1,3 do
cos[i] = coroutine.create(print_some_numbers)
end
local n = 1
while true do
for i=1, #cos do
if coroutine.status(cos[i]) ~= "dead" then
print("outer", coroutine.resume(cos[i],n))
n = n + 1
end
end
print(coroutine.yield())
end
end)
for i=1,5 do print(coroutine.resume(a)) end
This prints
outer true 1
outer true 1
outer true 1
true
inner 1 4
thread: 0x7f901b800608 true
thread: 0x7f901b408c08 false
outer true
inner 1 5
outer true 4
inner 1 6
outer true 4
true
inner 2 7
thread: 0x7f901b800608 true
thread: 0x7f901b408f88 false
outer true
inner 2 8
outer true 9
true
inner 3 9
thread: 0x7f901b800608 true
thread: 0x7f901b408928 false
outer true
true
true
If you remove lines 8 and 9, then you can get output that doesn't change based on the addresses of coroutines, and you can try running it with and without co_wrap. Over here it produces identical output with and without co_wrap on lua 5.3.5. This isn't a guarantee that I haven't written any bugs, of course.
Edit: While this may let users run code that uses coroutines, it won't be as fast as you might like. If you'd like to go fast, I think the right play is to fix the problem for real and to use some version of luajit.
Maybe... If I'm understanding this correctly (can't test it right now, not at my computer, so this is all though experiments for me for the next couple of hours or so), it still puts the onus on the user to know what can and can't be run within a coroutine, and if they get it wrong, the Hammerspoon application crashes.
Long term, we need to implement sharedWithState:
in LuaSkin
.
Short term, I need a way to detect, without having access the the lua_State passed into a C function, whether or not the state stored in the LuaSkin properties matches it or not...
Let me make that clearer, cuz I only understand it because I'm thinking about it right now and it still sounds like mud to me...
For the following, the "active" lua_State is the one passed into the C function by the lua engine.
Lets say I have a C function accessible from lua:
static int my_func(lua_State *L) {
LuaSkin *skin = [LuaSkin shared] ; // get the shared LuaSkin instance
[skin checkArgs:LS_TNUMBER, LS_TBREAK] ; // check the args on the stack
... do our stuff
return 1;
}
[skin checkArgs:...]
does some type checking on the stack passed in... internally it uses a lot of lua_type(...)
and lua_to...(...)
function calls, but all of them are done with skin.L
as the state, not the state that was passed into my_func
. And this is what causes the application crash when skin.L != L
.
We could add if (skin.L != L) { return luaL_error(L, "not coroutine safe") ; }
after the first line, but we'd have to do it for every function and method we've added, and that's a crap-ton. Plus it doesn't make them work in coroutines, when simply changing skin.L
for the duration of this C function would, so I'm willing to do the crap-ton once to switch [LuaSkin shared]
to [LuaSkin sharedWithState:L]
but not twice: once to make it safer (i.e. non-crashy), then again to make it work with coroutines, once I've made the necessary changes to LuaSkin itself.
It would be so much easier to test whether the "active" lua_State == skin.L
within [LuaSkin shared]
, but as you can see, that's not passed L
, and so far, rummaging through the source for lua, I'm not seeing a way to determine which luaState is the "active" one (i.e. the one that initiated the most recent `lua(p)call`).
To complicate matters, when handling a callback from the macOS, LuaSkin *skin = [LuaSkin shared] ;
gets invoked when the lua engine is idle and NO lua_(p)call
has been performed (so there is no "active" state)... now granted, in this case, a coroutine didn't cause the callback, so using skin.L
on the main lua thread is fine, but it does complicate determining which lua_State is "active", assuming I can find a way at all...
Hammerspoon could wrap all the functions that must be wrapped before exposing them to the user. I'm not sure if you have a big list of those somewhere though.
I don't think there's an easy way to find the active state, but there is an easy way to find the main state. So it's too bad you're not asking the opposite question of "I have an active state, and can you tell me if it's the main one?"
It seems like to solve the problem for real, checkArgs
should accept a lua_State
.
There are actually a lot of other helpers that LuaSkin provides, so I think the better solution is to change the first line to LuaSkin *skin = [LuaSkin sharedWithState:L]
in our functions, but leave it LuaSkin *skin = [LuaSkin shared]
in our callback handlers...
Internally, if withState
is called, skin.L
is set to that, and if not (or NULL is passed, which might be clearer in intent), it will set it to the main one which we can cache in a class property.
That requires the least changes (outside of LuaSkin itself) but still leaves open the possibility for a crash if one our developers uses the wrong version of shared
in the wrong place, which is why I was hoping for a way to detect it...
Oh well, I think then moving all of them to sharedWithState:
is the way to go, but allow it to be NULL; that way, using [LuaSkin shared]
won't compile and the dev will have to make an explicit choice.
@cmsj, not sure if you've been following these last comments, but unless you have objections, I think this is what I'm going to start doing on a new branch tonight/tomorrow so we can put it out for testing hopefully in a couple of days.
@asmagill I am following along. I have no objections to trying to fix this, but I have some fairly big concerns about the viability of this.
I hope I'm wrong, but my suspicion is that we're going to very quickly find that all sorts of things are going to break - e.g. we don't just have an assumption that there is only one lua_State
object, we also assume there's only one LuaSkin
object which is 100% coupled to the single Lua state. We assume that each library can have its own global refTable
and that assumption is now on very shaky ground. Those are just the ones that come immediately to mind.
tl;dr I don't currently see how any of our C API can be safe when it's being interacted with by multiple Lua states.
I think we’re OK as long as we stick to one application thread, since every function (with the exception of the proposed hs.yield
and I think I know how to address that) runs to completion before another L
could sneak in there.
A few of the functions which offer to run in the background but don’t return a userdata will have to be examined to make sure they ask for a new skin object rather than inherit their parents before the dispatch_async
, but I’ve been meaning to re-evaluate these anyways because they are also prone to crashing if the callback doesn’t trigger till after a reload, so...
At any rate, the only way to know is to try it; we can keep it in a branch so only devs run it for a while if you like.
@asmagill I think I'm mostly concerned about all the state in C that will now have multiple Lua environments mutating it.
I agree that the only way to be sure is to try, so maybe if we have a branch with the modifications to LuaSkin and then we need to hammer the HS API from the coroutines and the main thread and see what happens.
Admittedly this is a sort of band-aid over a framework that should probably (eventually) be refactored now that we've realized that one of the core assumptions that went into making it was mistaken, but I think it should work, once we've addressed all dispatch_*
blocks that assume the persistence of skin
and don't bother setting it again...
I'll keep you guys posted and let you know when a fork is ready for testing.
Closing as this has been addressed in master; reopen if you think I've missed some point that still needs to be addressed.
At all.
With the following from a coroutine tutorial (https://riptutorial.com/lua/example/11751/create-and-use-a-coroutine):
as soon as I enter
coroutine.resume(thread2)
, I get:I suspected this might happen if the coroutine actually invoked anything that used LuaSkin, but had forgotten that we no longer use
lua_pcall
anywhere by itself -- it's always invoked indirectly through LuaSkin'sprotectedCallAndTraceback:nresults:
.As I'm coming to understand it:
Reason: creating a coroutine invokes
lua_newthread
, which creates a newlua_State
variable; when the coroutine is running (i.e.resume
is invoked), it's the new state's stack that has things pushed/pulled from it, not the one that LuaSkin was initialized with...Fix: Nontrivial. Not only would we need to change every
[LuaSkin shared]
line in our C-functions to something like[LuaSkin sharedWithState:L]
, we'd need to implementlua_lock
andlua_unlock
so that callbacks could still use[LuaSkin shared]
(and thus use the initial state created when first initialized/reloaded) to protect the shared global space between the differinglua_State
's.Which is half-way to making the whole thing support multi-threading (at the Application level, not the psuedo co-routine lua level) anyways (see point 3 at https://stackoverflow.com/a/54046050).
So while the fix might be a "good idea"™, implementing it will be a giant PITA.
Unless I'm missing something obvious (somebody smarter please tell me that I am! Please, please, please!)