Kotlin / kotlinx.collections.immutable

Immutable persistent collections for Kotlin
Apache License 2.0
1.15k stars 59 forks source link

Iterable.intersect is very slow with PersistentList #64

Open yuriykulikov opened 4 years ago

yuriykulikov commented 4 years ago

Iterable.intersect(other: Iterable) takes a very long time to complete when called with a PersistentList as a parameter. Same function works faster with other iterables like List and Set. It is minutes with PersistentList and milliseconds with List.

I couldn't find the exact reason for that, but it seems that Collection.retainAll does something with the persistent list which takes ages to complete.

Here are some examples:

    (0..147853).toList().intersect((0..147853).toList()) // takes milliseconds
    (0..147853).toList().intersect((0..147853).toPersistentList()) // takes minutes
    (0..147853).toList().intersect((0..147853).toPersistentList().toSet()) // takes milliseconds

    (0..147853).toMutableList().retainAll((0..147853).toPersistentList()) // takes minutes
    (0..147853).toMutableList().retainAll((0..147853).toPersistentList().toList()) // takes milliseconds
GuillaumeEveillard commented 4 years ago

Hello,

retainAll calls Collection.contains(). The complexity of contains() is O(1) or O(logN) for sets and O(n) for list.

So, to be honest:

But the implementation of MutableCollection.retainAll(elements: Iterable) tries to be smart: in some cases, 'elements' is converted to a set and retainAll is applied using this set. It explains why the test case with two lists is so fast.

This behavior is handled by the following code from Iterables.kt

/** Returns true when it's safe to convert this collection to a set without changing contains method behavior. */
private fun <T> Collection<T>.safeToConvertToSet() = size > 2 && this is ArrayList

/** Converts this collection to a set, when it's worth so and it doesn't change contains method behavior. */
internal fun <T> Iterable<T>.convertToSetForSetOperationWith(source: Iterable<T>): Collection<T> =
    when (this) {
        is Set -> this
        is Collection ->
            when {
                source is Collection && source.size < 2 -> this
                else -> if (this.safeToConvertToSet()) toHashSet() else this
            }
        else -> toHashSet()
    }

When 'this' is a persistent list, it is a collection but not an array list, so safeToConvertToSet() returns false and we don't do the conversion to hash set.

This is only an analysis, I don't have any solution for now.