Kotlin / kotlinx.collections.immutable

Immutable persistent collections for Kotlin
Apache License 2.0
1.12k stars 56 forks source link

Faster union operations (fix) #89

Closed belyaev-mikhail closed 3 years ago

belyaev-mikhail commented 3 years ago

https://github.com/Kotlin/kotlinx.collections.immutable/pull/86#discussion_r490582688

Note that this is a conservative fix: we cannot use map/set fields cause they may be outdated, we also cannot use node fields because they allow undetectable concurrent modifications. Alternative: just call build(), but commiting the build result of an argument seems not right to me.

qurbonzoda commented 3 years ago

val otherNode = (elements as? PersistentHashSet)?.node ?: (elements as? PersistentHashSetBuilder)?.node should be okay. PersistentHashSet.node is final field. PersistentHashSetBuilder is not supposed to be used in concurrent environment, it is a mutable set.

belyaev-mikhail commented 3 years ago

But what about the following scenario?

val builder1: PersistentHashsetBuilder = ...
val builder2: PersistentHashsetBuilder = ...

// at this point builder2.node may be owned by builder2

builder1.addAll(builder2) // builder2.node is somewhere inside builder1.node

builder2.add("anotherValue") // builder2.node potentially mutably modified inside both builder2 (which is ok) and builder1 (which is not ok)
belyaev-mikhail commented 3 years ago

It is not about truly concurrent environments, but potential ownership violation

qurbonzoda commented 3 years ago

Oh, now I understand the problem. But modified builder argument is quite frequent occurrence to ignore. I think calling build() is not that bad idea considering the fact that Iterable<T>.toPersistentHashSet() is implemented:

fun <T> Iterable<T>.toPersistentHashSet(): PersistentSet<T>
    = this as? PersistentHashSet
        ?: (this as? PersistentHashSetBuilder<T>)?.build()
        ?: PersistentHashSet.emptyOf<T>() + this
qurbonzoda commented 3 years ago

After calling build() modification operations on builder argument may become less efficient as it doesn't own nodes any longer, however, after merging two sets the argument is usually throwed away (my observation). Alternatively we can copy relevant nodes from otherNode entirely if they have ownedBy != null.

belyaev-mikhail commented 3 years ago

Alternatively we can copy relevant nodes from otherNode entirely if they have ownedBy != null

I don't like this particularly because the typical a + b + c with current implementation can potentially produce lots of unnecessary copying as every node created during addAll will be marked by our temporary builder and will not have ownedBy == null. Calling build() seems ok to me if it's ok with you.

belyaev-mikhail commented 3 years ago

Applied. But I stumbled upon the following inconsistency: build() for map builder returns PersistentHashMap, while build() for set builder returns PersistentSet. Is this intentional?

ilya-g commented 3 years ago

I suppose it's just a matter of a missing covariant override. This class is internal, so the override return type can be made more specific without trouble. Probably the same for PersistentHashSet.builder() method.

qurbonzoda commented 3 years ago

Could you please add some tests to make sure the issue is solved?

belyaev-mikhail commented 3 years ago

Done