bobzhuyb / ns3-rdma

NS3 simulator for RDMA over Converged Ethernet v2 (RoCEv2), including the implementation of DCQCN, TIMELY, PFC, ECN and shared buffer switch
GNU General Public License v2.0
260 stars 119 forks source link

As for the DCQCN on RDMA READ flows #7

Closed fbz-ict closed 7 years ago

fbz-ict commented 7 years ago

According to my limited understanding, DCQCN depends on ECN to detect network congestion and utilizes marked ACKs to notify the sender to restrict its sending rate. However, for RDMA READ operations, payload as well as ACKs are carried in the response messages. Further, according to IB transport, there is no further ACKs for "read response". So, how does DCQCN control the rate of RDMA READ flows? Does the NP of DCQCN implement an additional ACK mechanism other than the original IB transport? Thanks.

bobzhuyb commented 7 years ago

DCQCN relies on the ACK, and can't work for verbs that do not have ACKs. There is ongoing research on this... but I don't know when we can present a production-quality solution.

fbz-ict commented 7 years ago

Thanks. So, according to your opinion, what is the biggest challenge to provide an additional "fake" ACKs to those operations?

bobzhuyb commented 7 years ago

It's about practical constraints and the trade-off between performance and costs... There are several options

  1. Implement an additional ACK mechanism in hardware
  2. Implement ACK in software (e.g., hack libibverbs)
  3. Avoid using RDMA READ
  4. Run RDMA READ without E2E congestion control
  5. Implement a feedback mechanism in switches, like QCN, but works for L3 network

You can clearly see each of the options has its price. The real problem is, given the application and setup (and some future applications that you may be able to predict), which is the most cost-effective solution.

fbz-ict commented 7 years ago

Thanks a lot. The 5 suggestions you gave are very constructive, and i think i need some time to think this thoroughly.