Kotlin / multik

Multidimensional array library for Kotlin
https://kotlin.github.io/multik/
Apache License 2.0
632 stars 39 forks source link

"Fail to find engine" leading to crashes in production - sometimes #155

Closed mreichelt closed 1 year ago

mreichelt commented 1 year ago

Hello!

We started using multik in production for matrix .dot and .inv methods. It's working smoothly for some Android users, but we noticed lots of crashes happening in production for others:

Caused by java.lang.IllegalStateException: Fail to find engine. Consider to add one of the following dependencies: 
 - multik-default
 - multik-kotlin
 - multik-openblas
       at org.jetbrains.kotlinx.multik.api.EngineKt.enginesProvider(Engine.kt:13)
       at org.jetbrains.kotlinx.multik.api.Engine.<clinit>(Engine.kt:68)
       at org.jetbrains.kotlinx.multik.api.Multik.getLinalg(Multik.kt:49)
       at org.jetbrains.kotlinx.multik.api.linalg._linalgKt.dotDefMMNumber(_linalg.kt:21)
       ...

We're simply using these two dependencies in our KMM module, which we then include as JAR in our Android project:

commonMain {
    dependencies {
        implementation("org.jetbrains.kotlinx:multik-core:0.2.0")
        implementation("org.jetbrains.kotlinx:multik-kotlin:0.2.0")
    }
}

The line in multik where the service is loaded is this one: https://github.com/Kotlin/multik/blob/master/multik-core/src/jvmMain/kotlin/org/jetbrains/kotlinx/multik/api/Engine.kt#L9

This is calling ServiceLoader.load(Engine::class.java).toList(), which uses The Java ServiceLoader.

The class that should be loaded here is KEEngine, which is referenced in the META-INF/services folder here: https://github.com/Kotlin/multik/blob/master/multik-kotlin/src/jvmMain/resources/META-INF/services/org.jetbrains.kotlinx.multik.api.Engine

What's weird about this is that it either should or should not work. But for our affected users, the app crashes one time - and after that it seems to work fine. At first I thought that the class is optimized away by R8 - but that wouldn't explain why it works some of the times.

Does anyone else have this issue?

mreichelt commented 1 year ago

To add one more note: we suspect a threading issue could be the cause. The ServiceLoader Android docs explicitly say:

Instances of this class are not safe for use by multiple concurrent threads.

We'll investigate more into that direction. That would mean that using multik is currently not thread-safe.

mreichelt commented 1 year ago

More info: when logging what Thread our calculation is running on, this is what we see:

thread = DefaultDispatcher-worker-13
thread = DefaultDispatcher-worker-22
thread = DefaultDispatcher-worker-35
thread = DefaultDispatcher-worker-12
thread = DefaultDispatcher-worker-37
thread = DefaultDispatcher-worker-41
thread = DefaultDispatcher-worker-41
... // lots more

But since this might be a regular use-case for multik we should check that calling it from multiple threads is supported!

devcrocod commented 1 year ago

Instances of this class are not safe for use by multiple concurrent threads.

I think you're right. This explains why the error only appears the first time and then doesn't reproduce. I'll try to reproduce this behavior myself to make sure that the problem is only with that. However, even without that, this part requires thread safety.

I'm planning to fix this issue and release a new artifact soon. Unfortunately, due to the multiplatform and infrastructure issues with Apple silicon, it's already delayed, and I can't say exactly when it will happen.

Also, I want to note that version 0.2.1 is available.