MayakaApps / Kache

Kache is a lightweight Kotlin Multiplatform caching library that supports both in-memory and persistent caches and supports different eviction strategies (LRU, FIFO, MRU, and FILO).
https://mayakaapps.github.io/Kache/latest
Apache License 2.0
124 stars 5 forks source link

InMemoryKache::getOrPut fails in high concurrency context #239

Open vfsfitvnm opened 3 months ago

vfsfitvnm commented 3 months ago

Hi :smile_cat: I noticed I was getting NullPointerExceptions when using SHA256KeyHasher. I could narrow it down to InMemoryKache - the following test fails:

    @Test
    fun concurrency() = runTest {
        val cache = InMemoryKache<String, String>(maxSize = 1)

        (0..100000).map(Int::toString).forEach {
            launch { cache.getOrPut(it) { it }!! }
        }
    }

Stack trace:

java.lang.NullPointerException
    at com.mayakapps.kache.InMemoryKacheTest$concurrency$1$2$1.invokeSuspend(InMemoryKacheTest.kt:36)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
    at kotlinx.coroutines.test.TestDispatcher.processEvent$kotlinx_coroutines_test(TestDispatcher.kt:24)
    at kotlinx.coroutines.test.TestCoroutineScheduler.tryRunNextTaskUnless$kotlinx_coroutines_test(TestCoroutineScheduler.kt:99)
    at kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt$runTest$2$1$workRunner$1.invokeSuspend(TestBuilders.kt:322)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
    at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:277)
    at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:95)
    at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:69)
    at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
    at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:48)
    at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
    at kotlinx.coroutines.test.TestBuildersJvmKt.createTestResult(TestBuildersJvm.kt:10)
    at kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt.runTest-8Mi8wO0(TestBuilders.kt:310)
    at kotlinx.coroutines.test.TestBuildersKt.runTest-8Mi8wO0(Unknown Source)
    at kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt.runTest-8Mi8wO0(TestBuilders.kt:168)
    at kotlinx.coroutines.test.TestBuildersKt.runTest-8Mi8wO0(Unknown Source)
    at kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt.runTest-8Mi8wO0$default(TestBuilders.kt:160)
    at kotlinx.coroutines.test.TestBuildersKt.runTest-8Mi8wO0$default(Unknown Source)
    at com.mayakapps.kache.InMemoryKacheTest.concurrency(InMemoryKacheTest.kt:32)
paulrs commented 1 month ago

We are seeing similar NPEs in this code.

Uncaught Kotlin exception: kotlin.NullPointerException
    at 0   AppEcc                              0x10587ad97        kfun:kotlin.Throwable#<init>(){} + 95 
    at 1   AppEcc                              0x105874167        kfun:kotlin.Exception#<init>(){} + 87 
    at 2   AppEcc                              0x105874387        kfun:kotlin.RuntimeException#<init>(){} + 87 
    at 3   AppEcc                              0x1058745a7        kfun:kotlin.NullPointerException#<init>(){} + 87 
    at 4   AppEcc                              0x1058af41f        ThrowNullPointerException + 131 
    at 5   AppEcc                              0x105d8f4d7        kfun:com.mayakapps.kache.SHA256KeyHasher.$transformCOROUTINE$0.invokeSuspend#internal + 783 
    at 6   AppEcc                              0x1059afd1f        kfun:kotlin.coroutines.native.internal.BaseContinuationImpl#invokeSuspend(kotlin.Result<kotlin.Any?>){}kotlin.Any?-trampoline + 67 

SHA256KeyHasher has the following code:

    override suspend fun transform(oldKey: String): String = hashedCache.getOrPut(oldKey) {
        oldKey.encodeUtf8().sha256().hex().uppercase()
    }!! // Since our creation function never returns null, we can use not-null assertion

Notice the comment at the end.

Now look at the following:

    override suspend fun getOrPut(key: K, creationFunction: suspend (key: K) -> V?): V? {
        get(key)?.let { return it }

        creationMutex.withLock {
            if (creationMap[key] == null && map[key] == null) {
                @Suppress("DeferredResultUnused")
                internalPutAsync(key, creationFunction) // <!--- ASYNC
            }
        }

        return get(key) // <!-- DID ASYNC ABOVE FINISH?
    }

I know nothing about coroutines in Kotlin, but I see an async call that I'm guessing is not guaranteed to finish before the return get call at the end of the method, resulting in a nasty race condition where sometimes you'll get a value and other times you will not. This seems extremely bad.

djehrlich commented 1 month ago

I was able to replicate the crash using the unit test provided above:

    @Test
    fun concurrency() = runTest {
        val cache = InMemoryKache<String, String>(maxSize = 1)

        (0..100000).map(Int::toString).forEach {
            launch { cache.getOrPut(it) { it }!! }
        }
    }

I found that by replacing the second call to get(key) in the getOrPut(…) function with a new function I called unsafeGet(key: K): V, the problem goes away. This is my unsafeGet(…) implementation:

private suspend fun unsafeGet(key: K): V = (getFromCreation(key) ?: getIfAvailable(key))!!

Since unsafeGet(…) is a copy of get(…), but with evictExpired() removed (and a force unwrap added), this suggests to me that evictExpired() may be triggering a race condition. However, my Kotlin experience is limited, so my confidence in this analysis is not particularly high.

As an aside, by calling get(…) twice, evictExpired() is being called twice. Is this really necessary in a single call to getOrPut(…)? Calling evictExpired() twice seems superfluous and possibly a performance issue.