evaluate cancelled but async jobs continue

TelephoneTan commented 3 months ago

Currently, the evaluate can cooperate with coroutine cancel, it will return when the current coroutine is cancelled, this is good.

BUT the async jobs triggered by evaluate including their callbacks will last forever and continue to run after evaluate has been cancelled, this is bad. We can say that the async jobs are leaked.

The root cause is here:

private val coroutineScope = CoroutineScope(jobDispatcher + exceptionHandler)
// ...
internal actual fun invokeAsyncFunction(
        args: Array<Any?>,
        block: suspend (bindingArgs: Array<Any?>) -> Any?,
    ) {
        ensureNotClosed()
        val (resolveHandle, rejectHandle) = promiseHandlesFromArgs(args)
        val job = coroutineScope.launch {
            try {
                val result = block(args.sliceArray(2..<args.size))
                jsMutex.withLock {
                    // Call resolve() on JNI side
                    invokeJsFunction(
                        context = context,
                        globals = globals,
                        handle = resolveHandle,
                        args = arrayOf(result)
                    )
                }
            } catch (e: Throwable) {
                jsMutex.withLock {
                    // Call reject() on JNI side
                    invokeJsFunction(
                        context = context,
                        globals = globals,
                        handle = rejectHandle,
                        args = arrayOf(e)
                    )
                }
            }
            jsMutex.withLock {
                // The job is completed, see what we can do next:
                // - Execute subsequent Promises
                // - Cancel all jobs and fail, if rejected and JS didn't handle it
                do {
                    val executed = executePendingJob(context, globals)
                } while (executed)
            }
        }
        jobsMutex.withLockSync { asyncJobs += job }
        job.invokeOnCompletion {
            jobsMutex.withLockSync { asyncJobs -= job }
        }
    }

invokeAsyncFunction uses a independent CoroutineScope instance to launch async jobs, but this CoroutineScope instance doesn't has the coroutine Job of evaluate's current coroutine as its scope Job, so all async jobs escape out of the "structured concurrency" of the evaluate's current coroutine and won't get cancelled if evaluate is cancelled.

The solution is quite simple: inject the Job of evaluate that triggers async jobs to the coroutine scope of those async jobs, then every async job triggered by an evaluate will listen to this evaluate's Job and get cancelled automatically when this evaluate is cancelled:

internal actual fun invokeAsyncFunction(
        // ###################################################################
        // inject the Job either through parameters or through outside context
        evaluateJob: Job,
        // ###################################################################
        args: Array<Any?>,
        block: suspend (bindingArgs: Array<Any?>) -> Any?,
    ) {
        ensureNotClosed()
        val (resolveHandle, rejectHandle) = promiseHandlesFromArgs(args)
        // ###################################################################
        // use the evaluate Job as the parent Job of this async job
        val job = coroutineScope.launch(evaluateJob) {
        // ###################################################################
            // ............
        }
        jobsMutex.withLockSync { asyncJobs += job }
        job.invokeOnCompletion {
            jobsMutex.withLockSync { asyncJobs -= job }
        }
    }

Now our async jobs will get cancelled automatically, so we can wait for them in evaluate even if we are in cancelled state:

private suspend fun awaitAsyncJobs() {
        jsMutex.withLock {
            do {
                // Execute JS Promises, putting this in while(true) is unnecessary
                // since we have the same loop after every asyncFunction call
                val executed = executePendingJob(context, globals)
            } while (executed)
        }
        // ######################################################################
        // This is important.
        //
        // We have to wait for all async jobs to get completed (even if they are
        // cancelled) before we return, so the code here must continue to run
        // even if we are in cancelled state.
        withContext(NonCancellable){
        // ######################################################################
            while (true) {
                val jobs = jobsMutex.withLock { asyncJobs.filter { it.isActive } }
                if (jobs.isEmpty()) {
                    // No jobs to run
                    break
                }
                jobs.joinAll()
            }
        }
    }

Additionally, even when evaluate is cancelled, the sync JS code must continue to run to push the JS code state to the correct final state so we can make sure no pending jobs are leaked, and JS code is not in a strange middle state that could probably causes strange bugs. All the clean-up steps must run as usual too:

internal actual fun invokeAsyncFunction(
        // ###################################################################
        // inject the Job either through parameters or through outside context
        evaluateJob: Job,
        // ###################################################################
        args: Array<Any?>,
        block: suspend (bindingArgs: Array<Any?>) -> Any?,
    ) {
        ensureNotClosed()
        val (resolveHandle, rejectHandle) = promiseHandlesFromArgs(args)
        // ###################################################################
        // use the evaluate Job as the parent Job of this async job
        val job = coroutineScope.launch(evaluateJob) {
        // ###################################################################
            try {
                val result = block(args.sliceArray(2..<args.size))
                // ###################################################################
                // JS sync code
                withContext(NonCancellable){
                // ###################################################################
                    jsMutex.withLock {
                        // Call resolve() on JNI side
                        invokeJsFunction(
                            context = context,
                            globals = globals,
                            handle = resolveHandle,
                            args = arrayOf(result)
                        )
                    }
                }
            } catch (e: Throwable) {
                // ###################################################################
                // JS sync code
                withContext(NonCancellable) {
                // ###################################################################
                    jsMutex.withLock {
                        // Call reject() on JNI side
                        invokeJsFunction(
                            context = context,
                            globals = globals,
                            handle = rejectHandle,
                            args = arrayOf(e)
                        )
                    }
                }
            }
            // ###################################################################
            // JS sync code
            withContext(NonCancellable) {
            // ###################################################################
                jsMutex.withLock {
                    // The job is completed, see what we can do next:
                    // - Execute subsequent Promises
                    // - Cancel all jobs and fail, if rejected and JS didn't handle it
                    do {
                        val executed = executePendingJob(context, globals)
                    } while (executed)
                }
            }
        }
        jobsMutex.withLockSync { asyncJobs += job }
        job.invokeOnCompletion {
            jobsMutex.withLockSync { asyncJobs -= job }
        }
    }

private suspend fun awaitAsyncJobs() {
        // ###################################################################
        // JS sync code
        withContext(NonCancellable) {
        // ###################################################################
            jsMutex.withLock {
                do {
                    // Execute JS Promises, putting this in while(true) is unnecessary
                    // since we have the same loop after every asyncFunction call
                    val executed = executePendingJob(context, globals)
                } while (executed)
            }
        }
        // ######################################################################
        // This is important.
        //
        // We have to wait for all async jobs to get completed (even if they are
        // cancelled) before we return, so the code here must continue to run
        // even if we are in cancelled state.
        withContext(NonCancellable){
        // ######################################################################
            while (true) {
                val jobs = jobsMutex.withLock { asyncJobs.filter { it.isActive } }
                if (jobs.isEmpty()) {
                    // No jobs to run
                    break
                }
                jobs.joinAll()
            }
        }
    }

private suspend fun evalAndAwait(evalBlock: suspend () -> Any?): Any? {
        ensureNotClosed()
        evalException = null
        // ###################################################################
        // NonCancellable to avoid loadModules() stops at a middle state
        withContext(NonCancellable) {
        // ###################################################################
            loadModules()
        }
        val result = jsResultMutex.withLock {
            jsMutex.withLock {
                // ###################################################################
                // NonCancellable to avoid evalBlock() stops at a middle state
                withContext(NonCancellable) {
                // ###################################################################
                    evalBlock()
                }
            }
            // ###################################################################
            // JS sync code
            withContext(NonCancellable) {
            // ###################################################################
                awaitAsyncJobs()
            }
            // ###################################################################
            // Clean-up run as usual
            withContext(NonCancellable) {
            // ###################################################################
                jsMutex.withLock { getEvaluateResult(context, globals) }
            }
        }
        handleException()
        return result
    }

dokar3 commented 3 months ago

This is quite right, it does not cancel jobs if the parent job of evaluate cancels.

Using "NonCancellable" is a bit of overkill in my opinion. Also, this may break the cancellation semantics: if you cancel "evaluate", but it only cancels the async job, it will continue to execute your JS code, what if it starts another job or calls something that I want to stop?

In my opinion, if evaluate gets canceled, we should stop everything. If you need to stop only the jobs but keep the JS code running, you can throw in the binding function and handle rejection in JS. It’s a bit cumbersome to implement now, so I would probably expose a function or the async jobs for the cancellation.

TelephoneTan commented 3 months ago

As the kotlin coroutines' document states, coroutine cancel is cooperate, you can not cancel something from outside and force it to stop, it will cause damage.

Kotlin coroutines code listens to the isActive state of current coroutine, and returns as soon as possible if the coroutine has been cancelled.

This is similar in JS, asyncFunction will throw CancellationException when the coroutine is cancelled, and the JS code will continue to do its job, handling this exception, releasing/cleaning the resources, etc, and finally return as soon as possible.

If we want the sync JS code can be cancelled, the only way is to expose an "isActive" variable to the JS code, so that the JS code can listen to the cancel and respond to it. Discarding the JS code without executing it is like a "force stop" and it will break the logic and cause damage.

There is no "force stop" in kotlin coroutines, there is only "cooperated cancel", so as it should be in QuickJS and JS world.

dokar3 commented 3 months ago

I didn't see how canceling evaluate but it does not stop the JS execution making the cancellation cooperative, on the contrary, it makes "evaluate()" harder to cancel.

IMO, canceling evaluate is kind of like calling process.exit() in Node, you should expect it to stop your execution. If you want to break the execution nicely you should better run the cleanup before the cancellation.

TelephoneTan commented 3 months ago

There is a difference, after you call process.exit(), the whole process is destroyed and all variables and states in the process are not available any more.

But after you cancel the evaluate, the QuickJS instance is still there, not destroyed. If you make some changes in the JS world in one previous evaluate, these changes are visible to the next evaluate, the JS state is remained. That's why I said if you force stop an evaluate, it will leave the JS world in a strange middle state.

So cancelling evaluate is not like calling process.exit(), instead, quickJS.close() plays the same role as process.exit().

TelephoneTan commented 3 months ago

Sure, cancel is not cooperative now, if you want it to be cooperative, just expose an isActive variable or something to the JS world, and listen to it in the JS side. This is also how kotlin coroutines do.

like this:

while(true || isActive){// loop infinitely until evaluate is cancelled
    console.log('I am alive')
}
console.log('evaluate cancelled, I quit now')

TelephoneTan commented 3 months ago

As for asyncFunction invoke, asyncFunction will throw a CancellationException(raised by kotlin coroutines), thus JS code can catch this exception and know it is time to quit.

TelephoneTan commented 3 months ago

This is how JS code "cooperate" with an evaluate-cancel.

dokar3 commented 3 months ago

So cancelling evaluate is not like calling process.exit(), instead, quickJS.close() plays the same role as process.exit().

Interesting point, this is also quite right, if you have multiple evaluate calls that do not depend on each other. Suppose we use the runtime to execute a single script, there may be a series of evaluate calls, but if one fails, you probably don't want to use the polluted runtime, this is also some kind of exit().

if you want it to be cooperative, just expose an isActive variable or something to the JS world, and listen to it in the JS side

This is weird, we should not put the responsibility of cancellation handling on JS code. When writing JS, I don't think anyone cares what happens if the runtime decides to cancel their code at some point.

TelephoneTan commented 3 months ago

Yes, they are two totally different use cases.

My use case is I implement part of my app logic using JS, so I can update the app logic remotely from server side, without requiring my users to update the app on their phone.

So I create and reuse a QuickJS instance globally, providing a lot of functions in JS, call evaluate every time I need to invoke the logic writen in JS. My JS functions don't depend on any global state at all, so I don't need to worry the JS runtime is "polluted".

Using multiple QuickJS instances is also another solution, but I worry that the QuickJS instance is too heavy that creating a QuickJS instance every time will impact the performance.

dokar3 commented 3 months ago

This makes sense. As I always use it for executing one-shot scripts, so this library hasn't focused on long-live runtime.

I haven't benchmarked the runtime overhead, but if you're primarily concerned with time, on a 2019 high-end Android phone, "QuickJs.create()" will take about 5 milliseconds. I think that's acceptable for most non-critical use cases.

TelephoneTan commented 3 months ago

Besides QuickJS.create(), I also has to eval a long webpacked javascript for every newly created QuickJS instance to register all JS functions in their runtime. That's horrible if I can't reuse the QuickJS instance.

dokar3 commented 3 months ago

Yeah, reusing is currently possible but getting it to work correctly and work well is kinda challenging, like something discussed in #108.

dokar3 / quickjs-kt

evaluate cancelled but async jobs continue #109