Questionable PersistentList time complexity specification

mcpiroman commented 2 years ago

As far as I see the current implementation of collections, specifically PersistentList, is that of described in the proposal, which said that the complexity of PersistentList is:

get(index), set(index, element) - O(log₃₂N), where N is the size of the instance the operations are applied on.
add(index, element), removeAt(index), remove(element) - O(N).
- add(element) and removeAt(size - 1) are O(1)
addAll(elements) - O(N).
addAll(index, elements), removeAll(elements), removeAll(predicate) - O(N M), optimizable to O(N+M), where M is the number of elements to be inserted/removed.
Iterating elements - O(N).

To my understanding, an implementation by simple delegation to an array that is copied on each operation would have the following:

🟩 get(index), set(index, element) - O(1)
🟨 add(index, element), removeAt(index), remove(element) - O(N).
- 🟥 add(element) and removeAt(size - 1) - O(N)
🟥 addAll(elements) - O(N + M)
🟨/🟩 addAll(index, elements), removeAll(elements), removeAll(predicate) - O(N+M)
🟨 Iterating elements - O(N).

This is, the current tire-based approach is only better at appending elements, which is very common operation for sure, but otherwise is either worse or unnecessary complicated compared to naive coping.

Do I understand it correctly? I assume the actual performance, including other aspects, like GC-pressure or data locality, does overweight raw complexity metrics, am I right?

mcpiroman commented 2 years ago

OK, I have read more about bit-mapped trie structure and now I understand that it is, generally speaking, optimal. (Funny enough it appears that I have invented this struct myself, before I read about it).

I suggest (better) documenting the current implementations' performance characteristic, e.g. that O(log32N) is in practice very little, but middle insert and remove are actually worse than O(N) - so much that it's the reason why .NET initially rejected moving from AVL tree to tire implementation.

For the letter I have some ideas how to improve the implementation, however I have a question: Why are all buffers always preallocated with 32 size? My only guess is that it helps runtime reuse the memory region in case it can prove that some buffer is not referenced anymore. Would it be a problem if I sometimes used 33 element buffers?

vlsi commented 1 year ago

set(index, element) is O(N) in case you copy the array on modifications, so it is way worse than the trie

Kotlin / kotlinx.collections.immutable

Questionable PersistentList time complexity specification #125