Closed eloparco closed 12 months ago
@wenyongh @xujuntwt95329 @yamt after the initial discussion in https://github.com/bytecodealliance/wasm-micro-runtime/pull/1869#discussion_r1066524792, any suggestion on the possible implementation of such a mechanism?
Initially I was thinking of using sigsetjmp/siglongjmp
(to resume from here), but then I would need to keep track of sigjmp_buf
s for each thread and also I don't know how portable it is for other non-linux platforms.
Otherwise, I'm not sure on what we can do in the signal handlers to stop the threads gracefully (ideally we'd like move to the next instruction after sleep/atomic_wait
so that the exception flag can be read to stop the execution).
I guess another approach would be to make all operations non-blocking in the host, but I'm not sure how much effort would that require (e.g. for WASI calls). Also, it might be problematic for user-defined native methods, so I think signals might be the way to go.
// edit We just had a discussion with @eloparco about the approach, he'll update the ticket with the notes.
Two approaches:
atomic.wait
)poll_oneoff
)setjmp
and longjmp
, similar to what is done with HW bound checks.
[-] portability problem because of setjmp/longjmp
(e.g. not available on Windows)
[+] no additional work required if new blocking functions are introduced
[+] works well with user-defined functions
progress:
it wouldn't work with user-defined functions
in either approaches, i guess user-defined functions need to be adapted anyway. eg. resource cleanup on termination.
maybe pthread cancellation is another alternative to consider for posix-like platforms?
in either approaches, i guess user-defined functions need to be adapted anyway. eg. resource cleanup on termination.
Why is that? If we use signals, the thread running the user-defined function would be stopped.
maybe pthread cancellation is another alternative to consider for posix-like platforms?
But then we wouldn't return to the runtime after stopping a (possibly blocking) instruction. It may even be fine for spawn threads, but what if it's the main thread?
I wonder what's the expected behavior of proc_exit? From what I see in the implementations, it just kills the process and doesn't allow for any resource cleanup (unlike C exit
, which does a bit more before exiting, e.g. calling atexit
callbacks). If that's the case, I don't think we have to worry about resource cleanup?
in either approaches, i guess user-defined functions need to be adapted anyway. eg. resource cleanup on termination.
Why is that? If we use signals, the thread running the user-defined function would be stopped.
for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.
maybe pthread cancellation is another alternative to consider for posix-like platforms?
But then we wouldn't return to the runtime after stopping a (possibly blocking) instruction. It may even be fine for spawn threads, but what if it's the main thread?
sure. probably you're right it's even trickier than signals to use for this purpose.
for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.
how would you address that? not sure if there's something we can do once the user-defined function is interrupted
for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.
how would you address that? not sure if there's something we can do once the user-defined function is interrupted
for example, we can make the user-defined functions deal with interruption by themselves.
for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.
how would you address that? not sure if there's something we can do once the user-defined function is interrupted
for example, we can make the user-defined functions deal with interruption by themselves.
I wonder if we actually have to provide that capability; whereas C allows registering callback for exit()
(using atexit
), I don't think there's equivalent for traps in either posix or C. If really needed, we can provide atexit
-like functionality for both traps and proc_exit
, for native user-defined functions, but I don't know how critical it is to have it right now.
for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.
how would you address that? not sure if there's something we can do once the user-defined function is interrupted
for example, we can make the user-defined functions deal with interruption by themselves.
I wonder if we actually have to provide that capability; whereas C allows registering callback for
exit()
(usingatexit
), I don't think there's equivalent for traps in either posix or C. If really needed, we can provideatexit
-like functionality for both traps andproc_exit
, for native user-defined functions, but I don't know how critical it is to have it right now.
for embedders like iwasm command, maybe it's enough to call host exit() and let the host os terminate threads.
on the other hand, wamr, as a library, should provide a way to clean up associated resources. termination of wasm instances doesn't necessarily involve termination of the host process.
for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.
how would you address that? not sure if there's something we can do once the user-defined function is interrupted
for example, we can make the user-defined functions deal with interruption by themselves.
I wonder if we actually have to provide that capability; whereas C allows registering callback for
exit()
(usingatexit
), I don't think there's equivalent for traps in either posix or C. If really needed, we can provideatexit
-like functionality for both traps andproc_exit
, for native user-defined functions, but I don't know how critical it is to have it right now.for embedders like iwasm command, maybe it's enough to call host exit() and let the host os terminate threads.
on the other hand, wamr, as a library, should provide a way to clean up associated resources. termination of wasm instances doesn't necessarily involve termination of the host process.
I didn't mean to call exit()
and close the host. What I meant is that maybe we don't have to worry about user-defined functions, and we can use signals and interruptions. Embedders, if needed, can handle resource cleanup within the process after the runtime finalizes.
for example, if a user-defined function uses malloc(), it somehow needs to catch the signal to free() it.
how would you address that? not sure if there's something we can do once the user-defined function is interrupted
for example, we can make the user-defined functions deal with interruption by themselves.
I wonder if we actually have to provide that capability; whereas C allows registering callback for
exit()
(usingatexit
), I don't think there's equivalent for traps in either posix or C. If really needed, we can provideatexit
-like functionality for both traps andproc_exit
, for native user-defined functions, but I don't know how critical it is to have it right now.for embedders like iwasm command, maybe it's enough to call host exit() and let the host os terminate threads. on the other hand, wamr, as a library, should provide a way to clean up associated resources. termination of wasm instances doesn't necessarily involve termination of the host process.
I didn't mean to call
exit()
and close the host. What I meant is that maybe we don't have to worry about user-defined functions, and we can use signals and interruptions. Embedders, if needed, can handle resource cleanup within the process after the runtime finalizes.
it's certainly possible to code host functions that way.
however, it isn't how we code host functions right now. (including the ones within wamr tree like wasi) so, most of host functions need to be adapted anyway. also, my honest impression is that the approach is too difficult for average programmers around me.
in other words, i prefer "if a host function doesn't implement async termination properly, it can't be async terminated (eg. possibly block forever)" over "if a host function doesn't implement async termination properly, it might cause problems like resource leak on async termination."
In that case I guess this could be a compilation/runtime flag. We worked in parallel on making the wait
instruction interruptable: https://github.com/bytecodealliance/wasm-micro-runtime/pull/1929/files
In that case I guess this could be a compilation/runtime flag. We worked in parallel on making the
wait
instruction interruptable: https://github.com/bytecodealliance/wasm-micro-runtime/pull/1929/files
i guess per host function flag makes more sense than a compilation/runtime flag.
i guess per host function flag makes more sense than a compilation/runtime flag.
What do you mean by host function flag?
i guess per host function flag makes more sense than a compilation/runtime flag.
What do you mean by host function flag?
eg. add a member to WASMFunctionImport to specify if it's safe to terminate the function that way.
From https://github.com/bytecodealliance/wasm-micro-runtime/pull/1869#discussion_r1067624306:
for another runtime i made all i/o non-blocking to handle thread termination requests. but i don't think it's an appropriate approach for WAMR.
I wonder if that's something we could actually implement that at least for the poll_oneof()
syscall - that function is being used for things like sleep in wasi-libc
so even though not all blocking operations are covered, we can already satisfy a lot of usecases for platforms where signal-based implementation is not available. @yamt @wenyongh I'd like to know your thoughts and concerns about that.
From #1869 (comment):
for another runtime i made all i/o non-blocking to handle thread termination requests. but i don't think it's an appropriate approach for WAMR.
I wonder if that's something we could actually implement that at least for the
poll_oneof()
syscall - that function is being used for things like sleep inwasi-libc
so even though not all blocking operations are covered, we can already satisfy a lot of usecases for platforms where signal-based implementation is not available. @yamt @wenyongh I'd like to know your thoughts and concerns about that.
do you mean to make these sleep-like functionalities wake up periodically internally so that it can check the extra termination conditions?
i suppose it's the only choice for platforms where signal-like functionalities are not available. (that's why i did it that way for another runtime, where one of its main target doesn't have signals.)
do you mean to make these sleep-like functionalities wake up periodically internally so that it can check the extra termination conditions?
Yes, I was a bit confused when you said it's not approperiate approach for WAMR, but looks like it was just a misunderstanding, and we're on the same page. We'll implement that in addition to signals then.
From #1869 (comment):
for another runtime i made all i/o non-blocking to handle thread termination requests. but i don't think it's an appropriate approach for WAMR.
I wonder if that's something we could actually implement that at least for the
poll_oneof()
syscall - that function is being used for things like sleep inwasi-libc
so even though not all blocking operations are covered, we can already satisfy a lot of usecases for platforms where signal-based implementation is not available. @yamt @wenyongh I'd like to know your thoughts and concerns about that.do you mean to make these sleep-like functionalities wake up periodically internally so that it can check the extra termination conditions?
i suppose it's the only choice for platforms where signal-like functionalities are not available. (that's why i did it that way for another runtime, where one of its main target doesn't have signals.)
Sounds reasonable, for libc-wasi and internal code, we might change the implementation.
But for user developed native library, it makes me a little confused: should runtime interrupt the thread running into native API by signal-like functionality, will it be not so friendly to developer? Should the developer be responsible for his behaviors? Or we just make this an option, well document it, and let developer choose whether to enable it or not?
do you mean to make these sleep-like functionalities wake up periodically internally so that it can check the extra termination conditions?
Yes, I was a bit confused when you said it's not approperiate approach for WAMR, but looks like it was just a misunderstanding, and we're on the same page. We'll implement that in addition to signals then.
i said so about making "all i/o non-blocking". but what you are talking about is something less extreme, right?
From #1869 (comment):
for another runtime i made all i/o non-blocking to handle thread termination requests. but i don't think it's an appropriate approach for WAMR.
I wonder if that's something we could actually implement that at least for the
poll_oneof()
syscall - that function is being used for things like sleep inwasi-libc
so even though not all blocking operations are covered, we can already satisfy a lot of usecases for platforms where signal-based implementation is not available. @yamt @wenyongh I'd like to know your thoughts and concerns about that.do you mean to make these sleep-like functionalities wake up periodically internally so that it can check the extra termination conditions? i suppose it's the only choice for platforms where signal-like functionalities are not available. (that's why i did it that way for another runtime, where one of its main target doesn't have signals.)
Sounds reasonable, for libc-wasi and internal code, we might change the implementation.
But for user developed native library, it makes me a little confused: should runtime interrupt the thread running into native API by signal-like functionality, will it be not so friendly to developer? Should the developer be responsible for his behaviors? Or we just make this an option, well document it, and let developer choose whether to enable it or not?
it's reasonable to make it an option.
should_terminate()
)but what you are talking about is something less extreme, right?
Yes, I'm only talking about the poll_oneof
function (at least for now)
Or we just make this an option, well document it, and let developer choose whether to enable it or not?
Yeah, I think we discussed it a few comments above; make it optional sounds reasonable to me.
is anyone still working on this?
is anyone still working on this?
Not currently, on my side at least. We started the effort on this branch bytecodealliance:dev/interrupt_block_insn. So, at the moment, the new tests from the proposal wouldn't pass.
is anyone still working on this?
Not currently, on my side at least. We started the effort on this branch bytecodealliance:dev/interrupt_block_insn. So, at the moment, the new tests from the proposal wouldn't pass.
ok. thank you for an update. are you (or someone) still willing/planning to work on this?
are you (or someone) still willing/planning to work on this?
Unfortunately, not in the short term, I don't have any time scheduled to work on it. Feel free to pick it up if you want.
With #1889,
proc_exit
and traps are properly propagated to all the threads in the process. Once a thread traps or callsproc_exit
, all the other threads are stopped.This is not true if the thread is executing a blocking operation (e.g.
sleep
,atomic.wait
). In that case, the blocking operation is not interrupted by the exception propagation process.Partial discussion here: https://github.com/bytecodealliance/wasm-micro-runtime/pull/1869#discussion_r1066524792