Why does a bad layout reduce performance on YCSB?

mihir-b-shah commented 1 year ago

Hello,

Sorry to bother you, I was wondering, what is the reason the "worst" layout reduces performance on YCSB, on Figure 16? (My intuition was that YCSB has no dependencies, and as such layout should not matter?) I was looking at the source code (specifically in make_txn() in benchmarks/ycsb/txn/switch.cpp), and if I understand correctly, the multipass/lock flag is set if we want to modify two separate records on the same stage?

Like specifically, as I understand it, if txn wants to modify different records on stages {S0, S0, S1, S1} independently, it would set the vector 'accesses' in the code to {(id=0, stage=0), (id=1, stage=0), (id=0, stage=1), (id=1, stage=1)}, and the current algorithm would set is_conflict upon seeing the third element in the vector? (Sorry if I am misunderstanding something here)?

I was wondering why this is the behavior (like can we use switch's MAU resources to write multiple registers in one stage, without needing to loop back)?

Thank you!

mjasny commented 1 year ago

Hi,

first, we group 8 YCSB operations together to simulate transactions. The layout still matters, if for example, 2 operations access tuples that are located in the same physical register array in a MAU-Stage of the Tofino, they cannot happen in a single-pipeline pass. That's because a register instance can only be accessed once per packet pass, thus this transaction would need to recirculate to complete the second operation.

P4DB uses it's "declustered layout" to minimize transactions that need to access a register multiple times and also transactions that access registers that are different from the pipeline order. This way we get as many single-pass transaction as possible which is good because we don't need to recirculate.

Your example is correct if id=0 and id=1 are different register instances. You can have multiple register-instances per stage but this number is limited. So id=0 and id=1 could be accessed in parallel. So you can actually write multiple registers-instances in a single stage, but it is not possible to access (write/read/...) a register-instance multiple times per stage. Notice that in he switch.cpp code the "instructions/accesses" for a switch transaction are reordered such that we require the minimal amount of recirculations, when is_conflict is set to 1, the P4 dataplane will stop parsing/execution of instructions after this flag and leave them for the next pass through the pipeline which is achieved by recirculation. In the second pipeline pass transactions that have already been executed are skipped and execution in the packet continues where it stopped previously.

Is it now clear?

mihir-b-shah commented 1 year ago

Yes, I understand now, thank you!

DataManagementLab / p4db

Why does a bad layout reduce performance on YCSB? #2