rdma-core limits number of UARs per context to 16 by default. After
creating 16 QPs, XLIO receives duplicates of blueflame registers for
each subsequent QP. As results, blueflame doorbell method can write WQEs
concurrently without serialization and this leads to data corruption.
BlueFlame can make impact on throughput since copy to the blueflame
register is expensive. It can improve latency in some low latency
scenarios, however, XLIO targets high traffic/PPS rates.
Removing blueflame method slightly improves performance in some
scenarios.
BlueFlame can be returned in the future to improve low-latency
scenarios, however, it will need some rework to avoid the data
corruption.
What
Remove BlueFlame doorbell method
Why ?
Fix data corruption for multi-threaded applications with over 16 QPs. Minor throughput/RPS improvement in some scenarios.
Change type
What kind of change does this PR introduce?
[x] Bugfix
[ ] Feature
[ ] Code style update
[ ] Refactoring (no functional changes, no api changes)
[ ] Build related changes
[ ] CI related changes
[ ] Documentation content changes
[ ] Tests
[ ] Other
Check list
[ ] Code follows the style de facto guidelines of this project
[ ] Comments have been inserted in hard to understand places
Description
rdma-core limits number of UARs per context to 16 by default. After creating 16 QPs, XLIO receives duplicates of blueflame registers for each subsequent QP. As results, blueflame doorbell method can write WQEs concurrently without serialization and this leads to data corruption.
BlueFlame can make impact on throughput since copy to the blueflame register is expensive. It can improve latency in some low latency scenarios, however, XLIO targets high traffic/PPS rates. Removing blueflame method slightly improves performance in some scenarios.
BlueFlame can be returned in the future to improve low-latency scenarios, however, it will need some rework to avoid the data corruption.
What
Remove BlueFlame doorbell method
Why ?
Fix data corruption for multi-threaded applications with over 16 QPs. Minor throughput/RPS improvement in some scenarios.
Change type
What kind of change does this PR introduce?
Check list