apple / swift-collections

Commonly used data structures for Swift
Apache License 2.0
3.78k stars 297 forks source link

`Deque` lacks `capacity` #308

Open glbrntt opened 1 year ago

glbrntt commented 1 year ago

Deque is often used as a buffer. However, you can't always control how much data you accept. In these cases it usually makes sense to reclaim storage space if the capacity grows beyond some limit to avoid holding on to too much memory unnecessarily.

This isn't possible with Deque because its capacity is as an implementation detail.

In this thread @lorentey suggested letting Deque shrink on removal along and adding init(minimumCapacity:persistent:). I think this would be sufficient although as described it has a minor drawback that it wouldn't lazily grow up to a capacity limit.

lorentey commented 1 year ago

Yes, I very intentionally omitted Deque.capacity, until I see a valid use case of it.

All code I've seen that tries to make decisions about shrinking storage based on capacity is overly sensitive to malloc behavior. Deque is using malloc_size to make use of every byte that it managed to allocate -- and if it happens to be given enough "free" space to put its capacity beyond some shrinking threshold naively implemented by Deque's client, then every operation may trigger the shrinkage, which would be quite bad.

The example code in the discussion you linked to was especially alarming to me:

self.buffers.removeAll(keepingCapacity: self.buffers.capacity < 16) // don't grow too much

16 seemed like a rather low value -- depending on the Element type and the whims of the system allocator, malloc may sometimes give us 16 items' worth of bytes even if we only ask for just a couple of them.

Given this, I'd prefer to figure out a more direct way to detect and recover from temporary allocation spikes.

I generally dislike the idea of having the core resizing logic live in client code -- it ought to be part of the container implementation, as it needs to evolve with it. Making Deque automatically shrink itself is one way to achieve this; another (possibly less disruptive) idea would be to expose an explicit operation to shrink the container to be near a certain target size.

To do this, I think Deque would need to either give up on using malloc_size, or it would need to keep track of how much extra space malloc gave us. (Either of these would be doable.)

I'm very much open to adding direct support for shrinking. (PRs are welcome, if you have time to experiment!) I'd need a bit more convincing to understand why exposing a public capacity would be a good idea. 😉

glbrntt commented 1 year ago

Yes, I very intentionally omitted Deque.capacity, until I see a valid use case of it.

All code I've seen that tries to make decisions about shrinking storage based on capacity is overly sensitive to malloc behavior. Deque is using malloc_size to make use of every byte that it managed to allocate -- and if it happens to be given enough "free" space to put its capacity beyond some shrinking threshold naively implemented by Deque's client, then every operation may trigger the shrinkage, which would be quite bad.

Oh I see, that makes sense. Thanks for shedding some light on why capacity was omitted 🙂.

Given this, I'd prefer to figure out a more direct way to detect and recover from temporary allocation spikes.

I generally dislike the idea of having the core resizing logic live in client code -- it ought to be part of the container implementation, as it needs to evolve with it. Making Deque automatically shrink itself is one way to achieve this; another (possibly less disruptive) idea would be to expose an explicit operation to shrink the container to be near a certain target size.

To do this, I think Deque would need to either give up on using malloc_size, or it would need to keep track of how much extra space malloc gave us. (Either of these would be doable.)

I'm very much open to adding direct support for shrinking. (PRs are welcome, if you have time to experiment!) I'd need a bit more convincing to understand why exposing a public capacity would be a good idea. 😉

Yeah, that seems absolutely fair, I'm more interested in having a way to shrink the storage. The lack of capacity only stops us reimplementing CircularBuffer in terms of Deque which isn't an issue as we can just switch to Deque in the next major version and remove CircularBuffer. We tend to reach for Deque in most places now anyway.

I'll try to find some time to experiment with the less disruptive explicit shrink operation.