apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.5k stars 1.31k forks source link

Review uses of std::deque and replace with our custom Deque #2226

Open etschannen opened 5 years ago

etschannen commented 5 years ago

The Deque implementation in Deque.h is more efficient that std::deque for most use cases. We should review the tradeoffs on a case-by-case basis.

ajbeamon commented 5 years ago

There is at least one case I know of where std::deque had been chosen intentionally over Deque, as they have different trade-offs. It was quite some time ago now, though, and I don't remember the specific details of this case.

etschannen commented 5 years ago

I remember that being in true for the deque in the TLog because it has less memory overhead, and we try to track the TLog memory explicitly with ratekeeper. I am not totally convinced by that argument though, and still think it is worth reviewing.

ajbeamon commented 5 years ago

I think this might be it:

https://github.com/apple/foundationdb/blob/36b124705482f700e4651fd359ce8202e1929594/fdbserver/TLogServer.actor.cpp#L363

ajbeamon commented 5 years ago

I think one potential concern with Deque was that it doesn't shrink, and so its overhead isn't necessarily a function of what is stored in it.

etschannen commented 5 years ago

In this particular case, if a storage server is removed, the queue associated with the tag of the removed storage server would be wasted memory. That particular problem could be solved with something that periodically shrinks empty queues.

In general it is a tradeoff before memory allocations and potentially wasted memory, and for high traffic queues (like this one) a relatively small amount of extra memory (in this case the size of the queue is limited by the amount of unspilled versions a TLog could have) could provide a large performance improvement.

ajbeamon commented 5 years ago

I think it's not just a problem when a queue goes cold, though. It basically means that the size of the queue will always be as large as is necessary to accommodate whatever was stored in it at its largest point, even if it is still collecting data.

ajbeamon commented 5 years ago

Another property we should be cognizant of when evaluating potential places to convert is that std::deque supports iterators and doesn't invalidate them when pushing or popping. We support random access, which would be affected by modifying the front of the queue.

etschannen commented 5 years ago

I guess my point is that the system already had to handle not running out of memory with the queue was at its largest point anyways. I think we could agree that leaving 10MB of memory is queues that could be shrunk would be worth it if it made the TLog 5% faster overall.

I do not know that bound on the amount of wasted memory, and I do not know how must faster it will make the TLog, but basically it seems like it is worth doing the investigation.

etschannen commented 5 years ago

One other idea would be to make a new deque implementation that reserved a specific size with Deque and then uses an std::deque whenever the first queue gets larger that the reserved size. This would still support iterators, and bound the amount of wasted memory.

ajbeamon commented 5 years ago

I'm not trying to argue that it isn't worth investigating, I'm just pointing out that this topic has come up previously in a particular situation where we considered the options and chose not to use Deque. Knowing the factors influencing our previous decision helps us to do a better job when investigating this case, and it gives us a better picture of tradeoffs when looking at others.

ajbeamon commented 5 years ago

already had to handle not running out of memory with the queue was at its largest point anyways

We have to be careful with this logic, though, because one way we handle running out of memory is by slowing down the transaction rate. It would be bad if, for example, a saturating workload reduced our future throughput in any significant way. If the wasted memory did end up being in the magnitude of 10MB, though, that seems pretty low cost.

etschannen commented 5 years ago

This came up in the context of https://github.com/apple/foundationdb/pull/1988 in which std::queue was chosen because they did not know Deque existed. I am sure there are other places where not a lot of thought was put into the decision.

I think Neelam was also going to investigate more places where std::map could be replaced with a deque.

etschannen commented 5 years ago

I did the math, and for a cluster with 700 storage servers in one region, 100MB would give enough memory for each tag to store 10,000 versions, which would be enough for 10 seconds of commits if every single commit in those 10 seconds had a mutation for a specific storage server.

If we ended up seeing a performance benefit it would probably be worth the hybrid deque design to ensure each tag could not waste more than 10,000 versions which would then bound the memory to 100MB at most.

ajbeamon commented 5 years ago

10,000 versions, which would be enough for 10 seconds

Just to confirm, are these the right numbers? In general 10 seconds corresponds to 10 million versions, but I wouldn't be surprised to find that there was some other factor involved here. Just wanted to check.

etschannen commented 5 years ago

Because of batching, clusters generally give out about 1 version every millisecond. It is possible to give out more, particularly in clusters with a lot of proxies, but if every version contains a commit to the same storage server the cluster would have a huge hot shard.

10,000 should be enough for almost all normal use cases, but there could scenarios where it goes above this number so we should use an std::queue to handle overflow for the rare cases.

sfc-gh-amotivala commented 5 years ago

Assigning to @etschannen since he's driving the conversation.