Kotlin / kotlinx.coroutines

Library support for Kotlin coroutines
Apache License 2.0
13.09k stars 1.86k forks source link

Lifecycle handling #47

Closed cbruegg closed 7 years ago

cbruegg commented 7 years ago

I was trying to use coroutine Jobs for Android lifecycle handling together with the UI dispatcher, but I've stumbled upon an interesting issue where basically the following code can cause a crash:

var job: Job? = null

fun foo() {
  job = launch(UI) {
    val bar = download().await()
    // Cancel could have been called now, but the following code is posted to a Handler and executed anyway
    performSomeWorkOnTheUI(bar) // Causes a crash after onPause
  }
}

override fun onPause() {
  job?.cancel()
}

A more elaborate example that consistently crashes due to the IllegalStateException thrown in onResume can be found here. The repo contains a complete executable project.

elizarov commented 7 years ago

Having studied this problem, I became convinced that it is not directly related to coroutines. Properly handling activity lifecycle on Android is hard even without asynchronous operations. See http://stackoverflow.com/questions/31432014/onclicklistener-fired-after-onpause for example. In short, you should never assume that nothing is going to happen after onPause. Quite to the contrary, even click listeners in your UI might get dispatched after onPause.

When you set isResumedCompat = false in your onPause you cannot assume that it is not going to be false in your UI code. There never is (and never was) a guarantee that onPause is the last notification that your are getting. It is just a notification that activity is being paused.

cbruegg commented 7 years ago

The advice in the StackOverflow answer was to unhook listeners in onPause though, which is exactly what I intended by calling job?.cancel() in onPause of the MainActivity. It is at least unintuitive that this doesn't stop the coroutine when the message is already in the message queue of the Android UI thread. Nobody will like having to check something like isResumedCompat after each suspending call, which is also easily missed and leads to subtle bugs.

mykola-dev commented 7 years ago

probably this could help: http://stackoverflow.com/questions/8040280/how-to-handle-handler-messages-when-activity-fragment-is-paused

also nice article about similar issue: http://www.androiddesignpatterns.com/2013/08/fragment-transaction-commit-state-loss.html

elizarov commented 7 years ago

@cbruegg Would the approach with message queuing presented in the stack overflow answer be acceptable for your use-case? The key question here is why you actually want to cancel your asynchronous activity on pause? Would not it better if async actions continued to run while activity is paused, but their notifications were queued until activity is resumed, so that they can be processed to update UI appropriately?

@deviant-studio Thanks for nice links. Let me quote the piece I find the most relevant from the article:

In general, the best way to avoid the exception is to simply avoid committing transactions in asynchronous callback methods all together. Google engineers seem to agree with this belief as well. According to this post on the Android Developers group, the Android team considers the major shifts in UI that can result from committing FragmentTransactions from within asynchronous callback methods to be bad for the user experience.

Hazer commented 7 years ago

@elizarov Maybe you are right about not canceling it to present on resume, but, what if I want to:

override fun onPause() {
    if (isFinishing()) {
        job?.cancel()
    }
}

Then would be nice that using cancel really works, because our Activity no longer will be back and we know it. There are some checks that could be handled by the coroutine. metalabdesign/AsyncAwait, for example, do some checks and have different execution options, one safe that do some checks, another one that leave it do the developer.

elizarov commented 7 years ago

These kind of checks can definitely be implemented as an option directly in dispatcher. It is quite easy to write your own application-specific dispatcher that includes such a check. You can cut-and-paste implementation of kotlinx.coroutines UI dispatcher and apply your project-specific checks into the dispatching logic there: https://github.com/Kotlin/kotlinx.coroutines/blob/master/ui/kotlinx-coroutines-android/src/main/kotlin/kotlinx/coroutines/experimental/android/HandlerContext.kt#L43

I'd appreciate if you report on your experience.

The tricky question is how to turn such code into a universally reusable project-agnostic library that is composable with arbitrary code (including 3rd party libs). To better understand a problem, consider a slight modification of topic-starting code by @cbruegg

  job = launch(UI) {
    val bar = download().await() // suspending call
    saveDownloadedDataToCache(bar) 
    performSomeWorkOnTheUI(bar) 
  }

How the logic that handles/schedules suspending calls can know that we still want to execute saveDownloadedDataToCache despite the fact that application is already finishing, but should not be executing performSomeWorkOnTheUI?

Hazer commented 7 years ago

@elizarov My bet, it doesn't really need to know. I know if we already downloaded, we should cache it and so on, but if my activity is finishing, maybe, trying to save it to cache can actually be harmful, what if saving to cache leads to a corrupt state? And actually, if I am calling cancel directly, I would expect it try to cancel as soon it can. If cancelling will stop me from saving to cache, then its my error as developer, not the library fault.

Android has the AsyncTask class, doing the same sample, using it, we got:

   @Override
   protected String doInBackground(String... params) {
      return download();
   }

   @Override
   protected void onPostExecute(T result) {
      super.onPostExecute(result);
      saveDownloadedDataToCache(result);
      performSomeWorkOnTheUI(result);
   }

onPostExecute(_:) method documentations states:

Runs on the UI thread after doInBackground(Params...). The specified result is the value returned by doInBackground(Params...). This method won't be invoked if the task was cancelled.

As you can see, no cache either in this sample, but there are more callbacks in an AsyncTask, even one for cancelled tasks:

    @Override
    protected void onCancelled(@Nullable T result) {
      if (result != null) {
         saveDownloadedDataToCache(result);
      }
    }

Maybe some onCancelled, onError, finally handler can be used for it? Maybe another kind of flow control? metalabdesign/AsyncAwait, again, as example, had some experiments on it, like .awaitWithProgress(_:) that somehow can handle progress, or an chainable launch(_:) with onError and finally.

So trying to imagine something, could be:

  job = launch(UI) {
    val bar = download().await(finally={
      saveDownloadedDataToCache(it)
    })
    performSomeWorkOnTheUI(bar)
  }

OR

  job = launch(UI) {
    download().await().also {
      performSomeWorkOnTheUI(it)
    }
  }.finally {
     it?.let { saveDownloadedDataToCache(it) }
  }

I was looking through the coroutines guide, and there are some nice samples like the Run non-cancellable-block, maybe we could create different kinds of context like NonCancellable but for other uses, like for Updating UI, Data Flow, and so on.

  job = launch(UI) {
    val bar = download().await() // suspending call, and this can be made `NonCancellable`?
    saveDownloadedDataToCache(bar) 
    run(UISafe) { // UI call, cancellable-on-beginning, with safety-checks.
        performSomeWorkOnTheUI(bar)
    }
  }

Yet another option, maybe is too suggest, that for UI updates, we should have a suspending function, that somehow can use the UI Pool, and we can call it by, for example, .awaitOnUI().

  job = launch(UI) {
    val bar = download().await() // suspending call, and this can be made `NonCancellable`?
    saveDownloadedDataToCache(bar) 
    performSomeWorkOnTheUI(bar).awaitOnUI() // UI 'suspending' call, cancellable-on-beginning, with safety-checks.
  }

OR

  job = launch(UI) {
    val bar = download().await() // suspending call, and this can be made `NonCancellable`?
    saveDownloadedDataToCache(bar) 
    performSomeWorkOnTheUI(bar).await(UI) // UI 'suspending' call, cancellable-on-beginning, with safety-checks.
  }

Or yet, leave this one as it is, call it as Working-As-Intended, and add some AsyncTask wrapper, or just some implementation that can be chainable with an AsyncTask-like flow:

  launch(Android) {
    download().await().also {
      saveDownloadedDataToCache(it) 
      performSomeWorkOnTheUI(it)
    }
  }.onCancelled { 
     it?.let { saveDownloadedDataToCache(it) }
  }

Of course, maybe I am going too deep, maybe you are right, and we should create this ourselves for the projects we are working on, doing so in a generic way can be really tricky and we don't know how it will be used, but if any suggestion I made feel useful or interesting to you, maybe we can look further?

PS: Its late, I am tired, sorry if any inconsistency or redundancy in my text... Thank you for your time.

mykola-dev commented 7 years ago

I think we need some kind of PausableHandler to implement ourselves or with Roman's help. This handler have to be aware of activity|fragment lifecycle. See SO post i provided early. I guess this is the only way to avoid "conditional race". So me agree, this is not a coroutines lib issue itself, but android framework flaw|feature. Roman, is there any chance we can get pausable handler behavior in the kotlinx.coroutines?

cbruegg commented 7 years ago

@elizarov I agree with @Hazer here. If you cancel the job even if you want to cache something regardless of the UI state, that's not an issue with the coroutine library. What I would expect however is that calling job.cancel() actually cancels the coroutine and doesn't just swallow the cancellation when the message has been posted to the queue, but hasn't actually resumed yet. If it did that correctly, there would be no need to have the HandlerContext know anything about the UI state, which is a bad idea anyway, since there'd need to be a separate HandlerContext per Activity and Fragment.

I've tried to find a way to modify the HandlerContext to do something like:

override fun dispatch(context: CoroutineContext, block: Runnable) {
  handler.post {
    if (<job has been cancelled>) <Resume continuation with the exception passed to job.cancel()>
    else block()
  }
}

Unfortunately I don't really know how to implement the parts in <>. (The same would have to be done for the other overridden methods of course.)

mykola-dev commented 7 years ago

@cbruegg try this one:

 if (context[Job.Key]?.isCompleted ?: false)
               Log.w("#", "already completed")                
            else
               block.run()

not sure if this code is ok, but crash has been gone

cbruegg commented 7 years ago

Yes, thanks, but it's only a workaround since it completely suppresses the cancelation exception that's supposed to be thrown.

Mykola D notifications@github.com schrieb am Fr., 21. Apr. 2017, 22:37:

@cbruegg https://github.com/cbruegg try this one:

if (context[Job.Key]?.isCompleted ?: false) block.run() else Log.w("#", "already completed")

not sure if this code is ok, but crash has been gone

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Kotlin/kotlinx.coroutines/issues/47#issuecomment-296300032, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKtPbvxYcV2oV6OB4zkSfVi8i_raTHjks5ryROkgaJpZM4Mr60Z .

elizarov commented 7 years ago

The CoroutineDispatcher abstract class is explicitly defined in a way that prevents its implementation from swallowing results of suspending functions (I'll explain reasons in more detail later), so if you want to experiment with that idea, then you'll need to start with your own implementation of ContinuationInterceptor. I've put the corresponding code into this gist: https://gist.github.com/elizarov/5a96e695e60c267f1fee934ebd22de5f

Now, using CancellableHandlerContext class from that gist you can define val CancellableUI = CancellableHandlerContext(Handler(Looper.getMainLooper())). However, you should be careful when using this CancellableUI as it is only safe with certain subset of suspending functions.

As an example from unsafe subset, consider ReceiveChannel.receive suspending function. Let me quote a piece of its documentation:

Cancellation of suspended receive is atomic – when this function throws CancellationException it means that the element was not retrieved from this channel.

However, if you use it is the above CancellableUI context, this atomicity guarantee is violated, e.g. receive can retrieve an element from the channel and throw CancellationException, completely loosing whatever value that was retrieved.

Fundamentally, I believe that the desire of cancelling coroutines on pause, for example, is a symptom of some architectural problems in application. Asynchronous actions (coroutines) that are bound to UI lifecycle is a design smell, if I may say so.

If an application is built with MVVM or a similar architecture, then all asynchrony and coroutines are part of application Models and ViewModels, which work quite independently of UI (View) lifecycle. E.g, when screen is rotated, then views should be destroyed and rebuilt, but all the models should continue to operate and to execute their asynchronous activities.

cbruegg commented 7 years ago

Thanks, I'll give that gist a try later today!

Asynchronous actions (coroutines) that are bound to UI lifecycle is a design smell, if I may say so.

I disagree. In the commonly used MVP architecture, binding asynchronous actions to the UI lifecycle is frequently done. For example, take a look at the CountriesPresenter in this example of the popular Mosby library. It fetches something from the network and then displays it in the UI, which is fine from a UX perspective if a progress indicator was shown before. It's also fine from a lifecycle POV since it checks isViewAttached() before accessing the UI in the callbacks.

This could be similarly achieved using RxJava, but without having to check isViewAttached():

var subscription: Subscription? = null
fun foo() {
  subscription = downloader.download()
                            .subscribeOn(Schedulers.io())
                            .observeOn(AndroidSchedulers.mainThread())
                            .subscribe { downloaded -> ... }
}
fun onPause() {
  subscription?.cancel()
}

I may be biased, but I've seen this pattern in a lot of apps already. Replacing it with the snippet from my initial post here seemed natural, but currently does not work as expected.

MVVM is an architecture that might work better with the current coroutine implementations in this library, but that I don't think architectures like MVP with asynchronous actions are a design smell. I'd argue that this library should ideally work well with all architectures.

E.g, when screen is rotated, then views should be destroyed and rebuilt, but all the models should continue to operate and to execute their asynchronous activities.

As mentioned before this doesn't apply when the application is finishing / shutting down. In that case, we're forced to either check isViewAttached() (or something similar, but after every suspending call, which is very error-prone) or have a reliable way of cancelling Jobs.

I hear your concern about channels, but I'm not familiar enough with the internals of this library to think of a solution. Thanks already for taking the time to investigate this!

mykola-dev commented 7 years ago

@elizarov please add this CancellableHandlerContext to the lib ;)

elizarov commented 7 years ago

Thanks for the Mosby - MVP link. I you are into MVP architecture, then you can still get the benefit of coroutines by replacing your callbacks with coroutines. True, you cannot get rid of having to check isViewAttached() in your UI code with MVP, but that is not the only problem with MVP.

I'm not yet sold on the necessity of adding something like CancellableHandlerContext into the core lib. I think a better investment would be into helper classes for an architecture where this is not an issue at all.

I confess that I'm not at all a fan of the code where there are explicit getView().showSomething() invocations. Reading that article was quite depressing for me, as it reminds me of the way we wrote apps in Delphi 20 years ago. I much prefer the code where views just subscribe to the state changes and update themselves, so instead of getView().showSomething(xxx) you do state.something = xxx and view just observes and reflects this change (Rx is now popular for that kind of thing, which is unfortunate, since it adds a lot of unnecessary noise and has quite a learning curve, but this kind of architecture can be based on a much simpler framework than Rx)

ilya-g commented 7 years ago

I've played with this problem a bit and found the way to add the required cancellation behavior with a custom continuation interceptor. See this gist: https://gist.github.com/ilya-g/314c50369d87e0740e976818d2a09d3f

elizarov commented 7 years ago

@ilya-g solution is better than mine custom CancellableHandlerContext as it is composable with an arbitrary dispatcher. Thanks.

Hazer commented 7 years ago

@elizarov As you told about reactive approach and with @ilya-g solution, there's also this UI Guide, now I also understand your concerns, do you have any reactive pattern in mind? I really like to see some proposed composable approach to this issue, and I'm not really happy with Rx also, so I am kind of avoiding it, but coroutines seem to fit nice. Somehow, I believe, even for MVP, we could try some kind of MVVM Presenter and there's a nice way of doing and avoiding those manual checks, without violating coroutines, maybe turning the Presenter into some kind of actor "controller" and composing the checks we want like the @ilya-g example for cancellation? Would be really nice to refactor an old style MVP for a more reactive MVP, instead of a huge refactor to some MVVM approach.

I am really not too familiar with the coroutine concepts to think in an example, I will do some experimentation, but maybe someone could guide some useful APIs or take your time sharing one example?

It has been a really nice thread for me.

cbruegg commented 7 years ago

By the way, even the official docs mention job.cancel() in a big example to be used for lifecycle handling. Without the interceptor this will also unexpectedly allow code to be executed on the main thread after onDestroy.

Thus, it would be great to see the interceptor in the library.

elizarov commented 7 years ago

Is there any problem to allow the code to be excecuted on the main thread after onDestroy per se? The problem, as I see it, is with the code that tries to use that state that was already destroyed in an implementation of onDestroy.

I am not yet convienced that the idea of directly modifing UI state from the coroutine is a sound arhitectural pattern that we shall endorce by providing the corresponding utility functions. I feel that there is a need for some indirection between asynchronous code in coroutines that updates some state and UI views that reflect those updates and that connection/link/bond shall be structured in such a way, that it is severed on the corresponding lifecycle notification without having to rely on coroutine cancellation.

elizarov commented 7 years ago

Having given the above comment, I must stress that this is still an issue. It has to be solved. Official docs also have to be fixed to provide a better guidance on how to deal with UI lifecycle.

cbruegg commented 7 years ago

Is there any problem to allow the code to be excecuted on the main thread after onDestroy per se?

Actually I believe this issue isn't limited to the main thread or even onDestroy. I'd say no matter what thread, it is at least unintuitive that a coroutine can resume execution after the corresponding job has been cancelled.

The problem, as I see it, is with the code that tries to use that state that was already destroyed in an implementation of onDestroy.

First, I think we need to to define what a resumption is. Is it the process of posting coroutine code to a queue or is it the actual start of execution? I'd argue for the latter definition since it doesn't depend on the internals of the dispatcher and thus may be what users intuitively think when they don't specifically observe the behavior of the coroutine library, which isn't very easy to grasp.

Any program that assumes that code following a suspension point of call to a cancellable function will not be executed after the job is cancelled even on the same thread is broken right now. In the following example the call to bar() will cause a crash due to state modification that happened before resumption, even though it happened on the same thread and the job was cancelled already. If I didn't know better, I wouldn't expect this to happen since I'd expect bar not to run if it didn't yet, setting didBarRun = true in the process.

coroutine

I am not yet convienced that the idea of directly modifing UI state from the coroutine is a sound arhitectural pattern that we shall endorce by providing the corresponding utility functions.

Are you sure people won't do this anyway even without proper library support? It compiles and seemingly works, except it will cause hard-to-debug crashes every once in a while, which will frustrate developers. I think the best you can do is provide appropriate tools to let them produce correct, though potentially architecturally unsound code.

Having given the above comment, I must stress that this is still an issue. It has to be solved.

Thanks for the acknowledgement and this discussion! 🙂

elizarov commented 7 years ago

The "compiles and seemingly works" in the best argument here. Of course, people will write whatever is best/easier to write. Unfortunately, there are tons of other tricky edge-cases with cancellation here, which might invalidate the whole idea of including cancellation support unless we can find a solution.

Take a related problem of resource cleanup (finally sections). It may not be immediately obvious, but cancellation does not stop execution of the code. It is just a signal to stop execution and the cancelled coroutine has a chance to execute whatever cleanup code it wanted to. Take this code for example:

val job = launch(someContext) {
    try {
       doSomething()
    } finally { 
       cleanupResources() // close files, etc
   }
}
// somewhere later
job.cancel()
// note that cleanupResources will be invoked afterwards!

(That was just thinking aloud)

elizarov commented 7 years ago

The solution that I'm inclining to implement is along the following lines:

  1. Specially mark suspending functions that has to be cancelled atomically (like Channel.send and Channel.receive). Add a special remark to their docs that they will continue to execute normally if they are cancelled after their atomic action of sending/receiving had already taken effect.
  2. Leave a default of non-atomic cancellation for all other cancellable suspending functions
  3. Provide stronger cancellation guarantees for non-atomic suspending functions (default ones), e.g a default suspending function will throw CancellationException after its job's cancellation, so invocations like download.await() will never continue normal execution after the corresponding job is cancelled, thus fixing the problem once and for all.
cbruegg commented 7 years ago

Sounds like a good solution which would definitely fix the troubles with my use case. 👍

elizarov commented 7 years ago

I've pushed this solution to develop branch for those of you impatient enough to build your own version to try it. It also fixes default start mode for launch and async, so that they can be cancelled before they have a chance to start the execution (which waiting in dispatch queue).

cbruegg commented 7 years ago

I gave it a try using the sample project that caused a crash before and it seems like cancellation is working as expected now. 🙂

elizarov commented 7 years ago

Thank you for spending time to confirm. I'm glad to hear it works for you. I'm closing this issue.