Open vorner opened 5 years ago
I believe Scala's Vector type is an rrb-tree. Looking at their docs this is how they describe it:
The operation takes effectively constant time, but this might depend on some assumptions such as maximum length of a vector or distribution of hash keys.
So it's effectively constant time, not amortized constant time.
source: https://docs.scala-lang.org/overviews/collections/performance-characteristics.html
I don't know what algorithm lives inside the scala implementation or if the docs are correct or what exactly the effectively means there.
Nevertheless, if it is true in the example that push_back
is O(1)
most of the time, but every 64th operation is O(log n)
, then from every practical sense I can see the push_back
is O(log n)
‒ worst case, average case, amortized… if you use it in an algorithm a lot, the O(log n / 64)
thing will just dominate.
I don't know how accurate this is but: https://stackoverflow.com/questions/20612729/how-does-scalas-vector-work
It appears, on the surface, im-rs has a width of 64, and Scala's Vector type has a width of 32, making them nearly the same data type, despite the difference in name.
If my understanding is correct, im-rs is not amortized time, it's effective time, which does have it's own classification. A hash table has an effective insert time of O(1), for example, not an amortized time. A C++ vector (dynamic array) has an amortized insert time of O(1).
edit: std::deque is a similar data type to an rrb-tree. It's more like an rb-tree. (Not relaxed I believe.) (For more information: https://en.cppreference.com/w/cpp/container/deque) On 64bit hardware std::deque defaults to a width of 64 like im-rs.
Looking up std::deque bigO most sites list it as O(1) time, not mentioning effective, just outright O(1). But they also say O(1) for unorderedmap (hash table). ¯\_(ツ)\/¯
I believe we are each talking about a different thing.
The linked stack overflow answer argues that it is a tree with branching factor 32 (64 in our case) and that you can fit only so many elements into RAM anyway (let's say 45 ‒ 48bits is the best CPUs can do in terms of virtual addresses, they discard the rest, and you need 8 bytes per pointer at least; let's not get into what happens with ZSTs), you get a maximum depth of the tree (which would be 8 here). Such argument is kinda slippery, because then you can say that bubblesort is O(1), because there's only so many elements you can fit into an ordinary vector to sort so we have an upper limit on how long it runs. On the other hand, that limit of 8 is reasonably low so from practical point of view claiming that to be O(1) is probably fine, which mostly everyone does, even if every computational complexity professor must grind their teeth at it.
However, that is not what I was pointing out. I was pointing that the two claims can't be true both at the same time:
These two claims are just plain out directly contradicting each other. If you want to make the above argument about limited depth of the tree, then even the every 64th push takes O(1) ‒ you have to build the argument from bottom up.
That's what I'm trying to point out, that having these two claims next to each other is confusing.
Hello
From the docs in 12.3:
This is AFAIK wrong. Amortized means that while one operation may be higher than the advertised complexity, n operations in a sequence must be bounded by n* the advertised complexity. For example, the classical std::vector::Vec has O(1) amortized push ‒ while a single push can take O(n) to reallocate and grow to twice the size, it happens on every (1/2)^n pushes so it sums to something like ~2 ‒ the trick there is that the bigger the reallocations, the rarer they happen.
On the other hand, O(log n) every constant number of elements is still amortized to O(log n).