Kotlin / kotlinx.coroutines

Library support for Kotlin coroutines
Apache License 2.0
13.05k stars 1.85k forks source link

Attempt to invoke interface method 'java.lang.Object kotlinx.coroutines.flow.Flow.collect(kotlinx.coroutines.flow.FlowCollector, kotlin.coroutines.Continuation)' on a null object reference #2692

Closed ScottPierce closed 2 years ago

ScottPierce commented 3 years ago

Kotlin 1.4.31 kotlinx.coroutines 1.4.2

I saw this crash come in on the crash reporter for Android. The crash doesn't appear to come from my code though. I cannot reproduce it at will. I'm unsure what's causing it.

One potential hint is that the crash happened very quickly (~300 millis) after the app launched.

Here is the stacktrace:

java.lang.NullPointerException: Attempt to invoke interface method 'java.lang.Object kotlinx.coroutines.flow.Flow.collect(kotlinx.coroutines.flow.FlowCollector, kotlin.coroutines.Continuation)' on a null object reference
        at kotlinx.coroutines.flow.FlowKt__TransformKt$filterNotNull$$inlined$unsafeTransform$1.collect(SafeCollector.common.kt:114)
        at kotlinx.coroutines.flow.DistinctFlowImpl.collect(Distinct.kt:87)
        at com.scottpierce.repository.DeviceActiveManager$1.invokeSuspend(DeviceRepository.kt:348)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)
        at java.lang.Thread.run(Thread.java:761)

Here is the class that it looks like it's coming from. It appears to be originating from the .filterNotNull() call chain. That's the only place in the file that calls filterNotNull which is clearly referenced in the stacktrace.

internal class DeviceActiveManager(
    private val scope: CoroutineScope,
    private val deviceRepository: DeviceRepository,
    private val timeout: Duration,
    private val clock: Clock = Clock.System
) {
    companion object {
        private val logger = Logger<DeviceActiveManager>()
    }

    private val _isDeviceActiveStateFlow = MutableStateFlow(true)
    val isDeviceActiveStateFlow: StateFlow<Boolean> get() = _isDeviceActiveStateFlow

    private var checkConnectivityJob: Job? = null
    private var lastConnectionInstant: Instant = clock.now()

    private val requestDelay = timeout / 2.0
    private var timeoutJob: Job? = null

    init {
        scope.launch {
            deviceRepository
                .isDeviceLinkedStateFlow
                .filterNotNull()
                .distinctUntilChanged()
                .collect { isLinked ->
                    checkConnectivityJob?.cancel()
                    checkConnectivityJob = null

                    if (isLinked) {
                        checkConnectivityJob = scope.launch { doWhileLinked() }
                    } else {
                        timeoutJob?.cancel()
                        timeoutJob = null
                        _isDeviceActiveStateFlow.value = true
                    }
                }
        }
    }

    suspend fun retry() {
        deviceRepository.isDeviceLinked()
        startTimeout()
        logger.i { "Device link was retried successfully. Device marked active." }
        _isDeviceActiveStateFlow.value = true
    }

    private fun startTimeout() {
        logger.i { "Starting timeout" }
        timeoutJob?.cancel()
        timeoutJob = scope.launch {
            delay(timeout)
            logger.i { "Device link timed out. Device is now inactive." }
            _isDeviceActiveStateFlow.value = false
        }
    }

    private suspend fun doWhileLinked() {
        startTimeout()

        while (true) {
            retryOnServiceException(
                logger = logger,
                startRetryDelay = 5.seconds,
                maxRetryDelay = 1.minutes,
            ) {
                deviceRepository.isDeviceLinked()
                lastConnectionInstant = clock.now()
            }

            startTimeout()

            logger.i { "Device link validated. Device is active." }
            _isDeviceActiveStateFlow.value = true
            delay(requestDelay)
        }
    }
}
qwwdfsad commented 3 years ago

Could you please specify how reproducible (e.g. how often crash reporter reports it) this crash? Is it bound to a specific Android device?

stkent commented 3 years ago

We're seeing a crash with a similar message but a slightly different trace and operator chain. We're returning a flow that uses filterNotNull from a flatMapLatest transform, and see crashes like:

java.lang.NullPointerException: Attempt to invoke interface method 'java.lang.Object kotlinx.coroutines.flow.Flow.collect(kotlinx.coroutines.flow.FlowCollector, kotlin.coroutines.Continuation)' on a null object reference
        at {redacted}$$special$$inlined$flatMapLatest$1.kotlinx.coroutines.flow.FlowKt__CollectKt.emitAll$$forInline(Merge.kt:131)
        at {redacted}$$special$$inlined$flatMapLatest$1.invokeSuspend(Merge.kt:223)
        at {redacted}$$special$$inlined$flatMapLatest$1.invoke(Unknown:12)
        at kotlinx.coroutines.flow.internal.ChannelFlowTransformLatest$flowCollect$3$invokeSuspend$$inlined$collect$1$lambda$1.invokeSuspend(Merge.kt:34)
        at kotlinx.coroutines.flow.internal.ChannelFlowTransformLatest$flowCollect$3$invokeSuspend$$inlined$collect$1$lambda$1.invoke(Unknown:10)

This is with Kotlin 1.4.32, coroutines 1.4.3.

For us this crash is not limited to a single device; so far we've seen examples on Galaxy S9, Galaxy S10, Pixel 3, Pixel 4, Pixel 4 XL at least. It does not appear to reproduce 100% of the time - we have initially been unable to reproduce locally, for example - but we suspect this is a timing issue in relation to other parts of our app startup, rather than wholly device-specific. About 3-5% of our small testing pool is seeing this crash, maybe more.

We plan to try to reproduce locally by expanding the duration in which filterNotNull is blocking the flow from emitting any value. We have not yet attempted to implement workarounds, but we will be testing replacing the null upstream values with a non-null instance indicating absence so that we can filterIsInstance rather than filterNotNull.

Hope this info helps and has not hijacked the original report 🤞🏻

ScottPierce commented 3 years ago

Could you please specify how reproducible (e.g. how often crash reporter reports it) this crash? Is it bound to a specific Android device?

I can't reproduce it on demand. I've only seen it once in production, but I saw it again while debugging on Friday.

qwwdfsad commented 3 years ago

My guess is that the Android runtime treats the initialization order slightly differently than regular JVM.

I.e. the same code is actually incorrect (but works due to implementation details on JVM), because it leaks not fully constructed object into a different thread (init created a coroutine that captures this.deviceRepository) and it is allowed to observe null in that field (but such race should be really rare and is not observable on x86). Another guess is that isDeviceLinkedStateFlow is mutable concurrently with the constructor and coroutine observes null at some point.

dshatz commented 3 years ago

I have noticed the same issue while building my Android app. Adding delay(50) in the beginning of init block helps, so the theory about leaking not fully constructed object seems plausible. I really don't like this solution though. What is the recommended way to start a coroutine right after the object have been initialized?

qwwdfsad commented 3 years ago

What is the recommended way to start a coroutine right after the object have been initialized?

Doing it in any application-specific way that suits you, just not in the init block itself

PhanVanLinh commented 2 years ago

I have some similar crash on production with coroutine version 1.6.0-RC. I don't see this crash with previous version.

Fatal Exception: java.lang.NullPointerException
       at com.example.MyViewModel$special$$inlined$flatMapLatest$1.invokeSuspend(Merge.kt:228)
       at com.example.MyViewModel$special$$inlined$flatMapLatest$1.invoke(Merge.kt:1)
       at kotlinx.coroutines.flow.internal.ChannelFlowTransformLatest$flowCollect$3$1$2.invokeSuspend(Merge.kt:34)
       at kotlinx.coroutines.flow.internal.ChannelFlowTransformLatest$flowCollect$3$1$2.invoke(Merge.kt:2)
       at kotlinx.coroutines.intrinsics.UndispatchedKt.startCoroutineUndispatched(UndispatchedKt.java:55)
       at kotlinx.coroutines.CoroutineStart.invoke(CoroutineStart.java:112)
       at kotlinx.coroutines.AbstractCoroutine.start(AbstractCoroutine.kt:126)

I don't start coroutine in ViewModel$init, I started it inside viewLifecycleOwner.lifecycleScope.launchWhenResumed in Fragment$onViewCreated.

private val refreshItemsTrigger = MutableStateFlow(0)

fun screenItems(): Flow<List<ScreenItem>> {
    return combine(
        screenItems,
        refreshMenuTrigger
            .flatMapLatest { getNewItems() }
    ) { screenItems, newScreenItems -> screenItems + newScreenItems }
}

fun refreshItems() {
    refreshItemsTrigger.value = ++refreshItemsTrigger.value
}
dkhalanskyjb commented 2 years ago

Does this happen on 1.6.0?

kfadli commented 2 years ago

I'm fighting with this issue last days, i tried to upgrade to coroutines-android:1.6.0 and the issue continue to throw.

Actually I can reproduce it once every 20 launchs of the application furthermore I only have this stacktrace without any more informations on the origin of the flow invoked (used massively in this application).

 java.lang.NullPointerException: Attempt to invoke interface method 'java.lang.Object kotlinx.coroutines.flow.Flow.collect(kotlinx.coroutines.flow.FlowCollector, kotlin.coroutines.Continuation)' on a null object reference
        at kotlinx.coroutines.flow.FlowKt__CollectKt.collect(Collect.kt:30)
        at kotlinx.coroutines.flow.FlowKt.collect(Unknown Source:1)
        at kotlinx.coroutines.flow.FlowKt__CollectKt$launchIn$1.invokeSuspend(Collect.kt:50)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
        at kotlinx.coroutines.internal.LimitedDispatcher.run(LimitedDispatcher.kt:39)
        at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:95)
        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)
qwwdfsad commented 2 years ago

Do you have a reproducing snippet/project that you can share? Otherwise, it is really hard to tell what exactly is wrong here

kfadli commented 2 years ago

Actually not, but i will try to isolate and reproduce it to provide more information about it (Flow is used massively and that's why it's really hard to know where the exception is thrown)

Legion2 commented 2 years ago

I had a similar problem with val stateFlow = flow {...}.stateIn(coroutineScope, SharingStarted.Eagerly, ...). When the flow builder lambda accesses a member which is initialized after stateFlow sometimes the member is null. I think using SharingStarted.Eagerly results in the same problem as using init {} to launch coroutines. There is no easy way to launch all StateFlow coroutines after the object is fully constructed. What are the best practice for this case and is is even guaranteed that all members are initialized in definition order, so we can prevent null pointer exceptions by ordering the members?

codewithvikas commented 2 years ago

I had same issue in my project. And i solved it be reordering objects.

private val dataResult = MutableStateFlow<Result<MyData>>(Result.Loading()) // if it is after repository object get same issue

private val repository: MyRepository = mock {
    on { myData }.thenReturn(dataResult)
}
PhpXp commented 11 months ago

I had a similar bug, very similar stack trace:

Fatal Exception: java.lang.NullPointerException:
       at kotlinx.coroutines.flow.FlowKt__CollectKt$launchIn$1.invokeSuspend(Collect.kt:50)
       at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
       at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:108)
       at kotlinx.coroutines.EventLoop.processUnconfinedEvent(EventLoop.java:68)
       at kotlinx.coroutines.internal.DispatchedContinuationKt.resumeCancellableWith(DispatchedContinuation.kt:375)
       at kotlinx.coroutines.intrinsics.CancellableKt.startCoroutineCancellable(Cancellable.kt:30)
       at kotlinx.coroutines.intrinsics.CancellableKt.startCoroutineCancellable$default(Cancellable.kt:25)
       at kotlinx.coroutines.CoroutineStart.invoke(CoroutineStart.java:110)
       at kotlinx.coroutines.AbstractCoroutine.start(AbstractCoroutine.kt:126)
       at kotlinx.coroutines.BuildersKt__Builders_commonKt.launch(BuildersKt__Builders_common.kt:56)
       at kotlinx.coroutines.BuildersKt.launch(Builders.kt:1)
       at kotlinx.coroutines.BuildersKt__Builders_commonKt.launch$default(BuildersKt__Builders_common.kt:47)
       at kotlinx.coroutines.BuildersKt.launch$default(Builders.kt:1)
       [redacted]

Turns out that Firebase Crashlytics doesn't send the full stack trace! Like 10 function calls at the top were missing. The NPE was in my code and I could see it clearly in debug builds.