apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.37k stars 1.3k forks source link

Fast Restore: Interference from pipeline-processing future version batches hurts performance #3594

Open xumengpanda opened 4 years ago

xumengpanda commented 4 years ago

When fast restore (FR) pipeline-processes multiple version batches, loaders can process workload at a future version batch even when there is workload for the current in-progress version batch.

For example, the current in-progress (minimum) version batch index is 4. FR is asking loaders to send mutations to appliers for version batch 4. The sending mutation workload can be interfered by the workload for version batch 5 - 7 that asks loaders to parse backup files and send mutations.

The interference may waste resource by leaving FDB cluster idle. The interference exists because FR does not differentiate the priorities of the same type of actors for different version batches. For example, the actors that parse backup files on loaders have the same priority for all version batches.

Challenge: A version batch's actor priority shall change when FR finishes processing a version batch. For example, VB 7 may have lowest priority when FR is processing VB 4. But when FR finishes processing VB 6, VB 7 will have highest priority. We need a way to dynamically assign priority to actors.

Possible solution: Each restore role knows the largest finished version batch index. When a restore actor starts, we assign the priority based on the finished version batch index. Whenever the actor is unblocked and runs, we re-calculate the priority the actor should be and compare with the start priority. If the new priority does not match, we should re-assign the new priority to the actor and yield.

This ensures: (1) actors in future version batch do not block actors in current version batch; (2) do not leave nodes idle when they have pending work to do.

Another solution: Evan suggested we may also use a priority queue to queue the requests and have our own logic to dispatch these requests based on version batch number. The reference code is https://github.com/apple/foundationdb/blob/release-6.1/fdbserver/MasterProxyServer.actor.cpp#L131

Update: Based on offline discussion with @dongxinEric , I removed the priority inversion in the issue because it didn't correctly describe the issue.

dongxinEric commented 4 years ago

Can you give more details about the priority inversion? Like roughly how the priority works and how an inversion happens?

xumengpanda commented 4 years ago

Can you give more details about the priority inversion? Like roughly how the priority works and how an inversion happens?

I assume the question is more about the definition of priority, since the issue description gave an example of the priority inversion.

The priority is flow's task priority. Each actor, when it waits, has a priority assigned. Each endpoint also has a priority assigned. The priority is define in TaskPriority in the code. In FR, it is defined as

RestoreApplierWriteDB = 2310,
    RestoreApplierReceiveMutations = 2300,
    RestoreLoaderFinishVersionBatch = 2220,
    RestoreLoaderSendMutations = 2210,
    RestoreLoaderLoadFiles = 2200,

Note: priority inversion does not necessarily cause dead lock.