Closed benoist closed 3 years ago
Well, doesn't your implementation leak memory? That is the array will never shrink again?
However I agree we should only move after some threshold, that is move if there' X free space in front of the buffer and also check that when pushing.
In any case check whether Deque
doesn't fit better for your usage pattern.
Duplicate of #573
We should probably document that Array#shift
is O(n)
Well Deque has the fast shift implementation, but I need other functions from array first before shifting.
@benoist As to why your implementation doesn't work, later when you need to grow the array later you lost the reference to the original pointer to invoke realloc
on.
Alternatively we could store an offset inside the array, but maybe that's a big penalty for just this use case.
converting to deque before shift has some overhead of course, but it's already a 160x speedup
user system total real
shift 1.600000 0.000000 1.600000 ( 1.606989)
each 0.000000 0.000000 0.000000 ( 0.000606)
fast shift 0.000000 0.000000 0.000000 ( 0.000649)
convert to deque 0.010000 0.000000 0.010000 ( 0.001799)
In Ruby they do increase the base pointer... I don't know how they can later reallocate. So it's worth investigating how they do it, it might worth doing the same in our case.
If I'm reading the ruby implementation correctly it seems like they are holding off the memmove until inserts and they double the capacity upon the insert. That would give you the move penalty only once to when you actually need to reallocate. If the array is shifted until empty it would never need the reallocation. Just need to make sure it doesn't leak memory like @jhass pointed out.
But how do they retain the original pointer and the current pointer?
If the array is not shared when shift is called, it will call the ary_make_shared function. I think this makes a copy of the original pointer and freezes it. This pointer is used in ary_modify to determine the shift.
Well, what functions do you need that Deque
doesn't have? I would have thought that all Array
functions could work on Deque
once implemented. If Deque
is the right datastructure to use here, then you should use it.
Another option is reverse!
then pop
to use just Array
.
I'm currently using uniq and sort from array, which might be a lot slower to do in deque as those functions are not just pushes and shifts for which deque is optimized. Thats just an assumption though, I might be wrong here.
Have you tried it? benchmarked it? Don't work on assumptions.
You can always convert the array into a deque rather easilly.
Or if use reverse!
and pop
to get O(1)
element removal from the ends in Array
. It just matters which end you remove from.
There's a bit of a problem: Deque.sort!
does not exist, because sorting implementation is hardcoded in Array
.
@oprypin which is exactly why I asked @benoist what functions he needed from array that were not on deque...
@RX14 Yes I know I can convert it to deque or use reverse and pop, but thats not really the issue. If we can make Array#shift a lot faster, isn't that worth investigating? It's not very intuitive to do all these workarounds because Array#shift is slow. The reason I'm making assumptions for the other parts now, is because putting the effort in testing and verifying still leaves this problem unsolved.
If the general conclusion is that Array#shift is as fast as it's going to get for now, then I have no problem if this issue gets closed. :-)
It's not very intuitive to do all these workarounds because Array#shift is slow.
Yeah, exactly. The non-workaround is Deque
.
I would still consider that a workaround, but if thats just me, I can live with that :-)
@benoist Using the correct datastructure for the job with the correct algorithmic complexity is a workaround?
if Deque is the only data structure that should do shifts, then shift should be removed from array. But I don't think that should be the case. If the overall operations on the same data are faster within one data structure, then it makes no sense to change. I think Array#shift can be made faster.
start = Time.now
a = Array.new(100_000, "a")
a.size.times do
a.shift
end
puts Time.now - start
Crystal: 1.6069080
Ruby: 0.006131
Ruby implements an optionally shared buffer for the Array. Some Array instances owns the memory they use, and they will themselves reallocate it and free it when needed. But some Array instances are shared, in which they will keep reference to an external reference-counted buffer and maintain an offset on it. This shared Array will never touch the buffer unless it is the only array referencing it. This also has Copy-on-Write semantics. Then an shared Array tries to modify some data, it will first allocate its own buffer and copy everything to it. This is all done without any external impact, so the user of an Array cannot tell the difference, except with timing.
Here is some proof of it, taking @benoist sample:
start = Time.now
a = Array.new(100_000, "a")
a.size.times do
a[0] = "b" # Modifying the array will force the array to own its own memory.
a.shift
end
puts Time.now - start
Running on my computer: (I did not run crystal with release optimizations!)
Ruby without changing array: 0.007283187
Ruby changing array: 3.364405281
Crystal without changing array: 2.3464860
Crystal changing array: 2.3259410
This kind of optimization is nice because it makes some things faster and you only have to pay some costs if you need to pay the costs. But it also brings inexplicable slowdowns in functions that shouldn't ever be slow. Who could tell that a[0] = "b"
would take time proportional to the length of the array?
Refs:
rb_ary_shift
calls ary_make_shared
if the Array is not yet shared.
Everything that may modify the array call rb_ary_modify
, which will make shared Arrays not shared (by allocating memory and copying/moving).
All happens here: https://github.com/ruby/ruby/blob/trunk/array.c
@lbguilherme thank you for this explanation! Just to be sure, what would be your suggestion to do with the current Array#shift implementation, leave it as is or change it into something similar to ruby?
I'm not sure in which direction Crystal wants to go here. We could:
Leave it as it is now. shift
is popping from the front of a container and it is slow. Document it and so that it is now a burden on the user to adapt to another structure (like copying the data into a Deque before) or pay the cost if he knows the array is small. The nice thing about this is that each operation has a clear cost that can be explained in documentation.
Apply extensive optimizations in all sorts of places. This would have an impact on how data is stored and on pretty much all functions. The end result would be that everything is faster for the average user, but this makes the code more complicated to maintain and it also makes behavior less predictable. Why did writing to a single byte of memory cause my program to slowdown 462 times? This is hard to explain.
Still on point 2, there are simpler optimizations, like keeping an buffer and offset and only reallocating the they differ too much. This would be of much less impact than shared buffers, but still.
Just to be clear, there are possible optimizations for many other functions as well, not just shift
. Taking a slice could benefit too, for example.
I don't dislike either solution. Maybe a speed focused container could come as a shard so not everybody would have to fear unpredictable performance. Or maybe is should be in the standard Array itself, so that everybody can benefit the performance. I particularly like optimizing for the average user, even with surprising behavior.
I would like to hear from @asterite on this.
From a user perspective, I always liked how Array, Hash and String are super generally optimized data structures in Ruby. They can share data with other instances, they are mutable and can be made immutable, they have generally fast operations, etc. Of course that comes at the cost of implementing all of that. Maybe in Ruby it makes more sense because it's a dynamic language and implementing other data structures is inefficient, unless implemented in C. In most (all?) compiled languages you have different data structures, like in Java you have ArrayList, LinkedList, Dequeue, etc. That's nice but it's more cumbersome for the user, because she has to pick a data structure.
So... I don't know. If you need to shift
a lot maybe just use another data structure? If sort!
is missing in Dequeue
maybe we should implement that? Maybe you can just use reverse!
and then shift
afterwards? It's hard to know without knowing what problem you are trying to solve, @benoist
Maybe adding an offset to Array is acceptable, I don't know. Maybe String
could also have shared memory with parent strings to form views. I think all of that falls in the field of optimization, and right now that's not important, because that can always be done later without changing the external API.
This discussion should happen later once optimizations like this come to the top of our agenda, instead of simply features and stability. At that time we'll have a lot more data on array performance in practice.
@asterite I've written a simple column storage database with encoding and to convert the columns to rows again I was shifting the values to form the rows. I'm using iterators because the column storage is not always aligned to values that belong to the same row due to compression.
I'm keeping an offset now and iterate through the array instead of shifting. This allows me to use the sort function without an extra conversion to Deque.
What I find interesting is that I can read more about how ruby implements arrays here on crystal, than I can find on the ruby issue tracker or what not. Sorry for the distraction. ;-)
With regards to the discussion of tradeoffs. Using a circular buffer like in Deque
, only incurs a slowdown when the Array
has unshift
ed or shift
ed elements. The added overhead is a memcpy
of a subset less then half the size of the whole, which is much faster then the realloc
that would have to be done every time an element is unshift
ed or shift
ed in the current implementation.
For anyone Interested, this is the implementation used by NSArray
in Cocoa.
http://ciechanowski.me/blog/2014/03/05/exposing-nsmutablearray/
There's also the thing that passing an Array to C is an O(1) operation right now. If we change Array to use a circular buffer that's not true anymore. I guess when passing it to C we'd have to rearrange its elements so that the first item is at the first position, and then we can pass a pointer. But once that's done there's no need to do that again, unless there's a subsequent shift
. And since most array instances won't be passed to C maybe that's a good compromise, because the shift
penalty is only payed when you pass it to C.
Then the other penalty is having each Array instance be 4 bytes bigger. Maybe that's not a big issue. We could probably try to remove Deque and let Array be implemented like that. It will simplify the choice a user has to make when choosing a data structure, and Array will be efficient as a stack and as a queue (like in Ruby).
Just a note: Array being a few bytes larger is probably not an issue. It could even be an optimization to make Array even larger (say 12 or 16 bytes more) to hold short arrays inline, as short arrays are common. This would need a benchmark, of course.
There's a bit of a problem: Deque.sort! does not exist, because sorting implementation is hardcoded in Array.
Couldn't you sort in reverse order? Then you could use pop
.
Or you can find index of minimum, swap it with last element, and then pop
.
Or you may make binary heap, and pop minimum/maximum as you go.
Do you really need sort!
+ shift
, or it is just "quick and dirty way to remove smallest element"?
Do you really need sort! + shift, or it is just "quick and dirty way to remove smallest element"?
Well for my use case it was the best readable version of what it needs to do. I don't really care about actually removing the elements from the array it just saves having to keep an index and checking the index against the size. In ruby the speed was the same, in crystal it was different. That is not a problem, just a bit surprising. As shifting in it's current state is too slow for my use case, i've changed it to just looping through the elements in a sorted order. But if array gets implemented as Deque and shifting is fast again I might change it back for readability.
Looks like sorting in reverse order and pop
will also do the job for you.
Yes that would also work :-)
@asterite there is no need in circular buffer. Just move array's content to the begin of allocation when allocation end is reached. That is the way Ruby's Array works actially. 'Shared array' is just implementation trick. Crystal can use just pointer to begin of allocation, size of allocation, pointer to first element and size. In other words, only pointer to first element should be added (or pointer to begin of allocation).
@funny-falcon Yes, that's a possibility. But if you do shift
+ push
in a loop doesn't it always grow the array infinitely? The current pointer would be increased but never decreased.
If Array will be used as Deque, then there should be amortization room, so move doesn't happen too often.
I did fix for Ruby's Array exactly for this scenario (ie Ruby as Deque).
@asterite
But if you do shift + push in a loop doesn't it always grow the array infinitely?
As I've said, when push
reaches bound of capacity, and there is a room caused by shifts, elements are moved to begin of allocation.
If i'm not non-grata here, I can make PR for this.
@funny-falcon Sure! No one is non-grata here.
I still have my doubts about this, though: adding four bytes to all arrays for just one method that's maybe not used all the time. And for example in the OPs use case there was really no need for a shift.
Haven't we already mentioned that it's too early for this kind of optimization? Besides, this is nothing more than a workaround.
Bleh nevermind, not having to use
Uh nevermind on the nevermind. See my next commentDeque
would be tempting.
Just a quick question, what downsides are there of a deque over an array with a start offset from the allocation start? Apart from the obvious to_unsafe
one, which I don't think is a big deal. I wouldn't have thought that the array accesses would have been slowed down much by the tiny bit of extra maths.
@asterite , single benefit of implementing this is getting rid of Deque, and being closer to Ruby.
All other issues could be solved with programmers discipline.
@RX14 simplest way is to not change usage of @buffer
, just move it with shift
(and, possibly, with unshift
). There is only need to save pointer to allocation, or amount of elements @buffer
were shifted from allocation (ie offset). In latter way allocation start is calculated as @buffer - @offset
.
I'd preffer to store pointer to allocation.
First about the upsides of Deque
and downsides of this suggested approach (which, frankly, has NOT been clearly described so far).
This kind of workaround cannot achieve its asymptotic complexity. Deque
guarantees O(1)
complexity when using a balanced number of push
+shift
or unshift
+pop
. This kind of shifted array would instead occasionally (with a constant factor!) cause an O(N)
operation. So the improvement is by a constant factor (or as some would say, no improvement) over the basic Array
.
Sure, it improves the situation of many consequent shift
calls (which Deque
also covers very well), but why all the complexity just for this case? And now I'm sure that it definitely cannot replace Deque
.
The downside of a Deque
is, hmm, I don't know, but it probably destroys various kinds of caches so is unforgivably slower at typical operations you do with it.
I really don't think that Deque
will be any slower than Array
in typical operations. The only downsides I can think of is that the prefetcher will get confused when wrapping, and that the branch prediction might fail too. Hopefully LLVM can compile it to a conditional move to avoid having to stall the pipeline.
In actual fact, I think that 95% of arrays will never wrap. Adding elements to the end is by far the most common array mutation op, and so I think most arrays won't ever wrap around.
In fact, instead of this suggestion, why not actually mix the implementations of Array
and Deque
? Have a shortcut scenario for when it's aligned. Re-align it to zero when required (to_unsafe
) and before expensive operations (sort!
).
@oprypin what do you mean by a "shortcut scenario". The check that offset + i < size
will always be true and it'll just act like an array?
I was using array shift a lot, but I found out it's pretty slow compared to using an index lookup
This is the current implementation
If I change this function by
It is a lot faster Shifting Array(String).new(100_000, "a") shows the following speed improvement
Question is, can it really be implemented like this or is there in important reason the buffer move and clear is required for this operation? I've made the change in the current master branch and tests all pass.