Open etschannen opened 5 years ago
There is at least one case I know of where std::deque
had been chosen intentionally over Deque
, as they have different trade-offs. It was quite some time ago now, though, and I don't remember the specific details of this case.
I remember that being in true for the deque in the TLog because it has less memory overhead, and we try to track the TLog memory explicitly with ratekeeper. I am not totally convinced by that argument though, and still think it is worth reviewing.
I think one potential concern with Deque
was that it doesn't shrink, and so its overhead isn't necessarily a function of what is stored in it.
In this particular case, if a storage server is removed, the queue associated with the tag of the removed storage server would be wasted memory. That particular problem could be solved with something that periodically shrinks empty queues.
In general it is a tradeoff before memory allocations and potentially wasted memory, and for high traffic queues (like this one) a relatively small amount of extra memory (in this case the size of the queue is limited by the amount of unspilled versions a TLog could have) could provide a large performance improvement.
I think it's not just a problem when a queue goes cold, though. It basically means that the size of the queue will always be as large as is necessary to accommodate whatever was stored in it at its largest point, even if it is still collecting data.
Another property we should be cognizant of when evaluating potential places to convert is that std::deque
supports iterators and doesn't invalidate them when pushing or popping. We support random access, which would be affected by modifying the front of the queue.
I guess my point is that the system already had to handle not running out of memory with the queue was at its largest point anyways. I think we could agree that leaving 10MB of memory is queues that could be shrunk would be worth it if it made the TLog 5% faster overall.
I do not know that bound on the amount of wasted memory, and I do not know how must faster it will make the TLog, but basically it seems like it is worth doing the investigation.
One other idea would be to make a new deque implementation that reserved a specific size with Deque
and then uses an std::deque
whenever the first queue gets larger that the reserved size. This would still support iterators, and bound the amount of wasted memory.
I'm not trying to argue that it isn't worth investigating, I'm just pointing out that this topic has come up previously in a particular situation where we considered the options and chose not to use Deque
. Knowing the factors influencing our previous decision helps us to do a better job when investigating this case, and it gives us a better picture of tradeoffs when looking at others.
already had to handle not running out of memory with the queue was at its largest point anyways
We have to be careful with this logic, though, because one way we handle running out of memory is by slowing down the transaction rate. It would be bad if, for example, a saturating workload reduced our future throughput in any significant way. If the wasted memory did end up being in the magnitude of 10MB, though, that seems pretty low cost.
This came up in the context of https://github.com/apple/foundationdb/pull/1988 in which std::queue
was chosen because they did not know Deque
existed. I am sure there are other places where not a lot of thought was put into the decision.
I think Neelam was also going to investigate more places where std::map
could be replaced with a deque.
I did the math, and for a cluster with 700 storage servers in one region, 100MB would give enough memory for each tag to store 10,000 versions, which would be enough for 10 seconds of commits if every single commit in those 10 seconds had a mutation for a specific storage server.
If we ended up seeing a performance benefit it would probably be worth the hybrid deque design to ensure each tag could not waste more than 10,000 versions which would then bound the memory to 100MB at most.
10,000 versions, which would be enough for 10 seconds
Just to confirm, are these the right numbers? In general 10 seconds corresponds to 10 million versions, but I wouldn't be surprised to find that there was some other factor involved here. Just wanted to check.
Because of batching, clusters generally give out about 1 version every millisecond. It is possible to give out more, particularly in clusters with a lot of proxies, but if every version contains a commit to the same storage server the cluster would have a huge hot shard.
10,000 should be enough for almost all normal use cases, but there could scenarios where it goes above this number so we should use an std::queue
to handle overflow for the rare cases.
Assigning to @etschannen since he's driving the conversation.
The
Deque
implementation inDeque.h
is more efficient thatstd::deque
for most use cases. We should review the tradeoffs on a case-by-case basis.