NVIDIA / NVFlare

NVIDIA Federated Learning Application Runtime Environment
https://nvidia.github.io/NVFlare/
Apache License 2.0
648 stars 181 forks source link

Enhance comm scalability Part 1 #3047

Closed yanchengnv closed 1 month ago

yanchengnv commented 1 month ago

Fixes # .

Description

This is the 1st PR to enhance communication scalability of NVFLARE.

Currently, SP is the comm bottleneck, because all CPs and CJs are connected to it, and it serves as the relay for all CJ/SJ communications.

This PR makes it possible to use many separate relay processes for CJ/SJ communications, and remove the need for CP and CJ to connect to the SP directly. Instead they will only need to connect to the relays.

Furthermore, relays can be organized hierarchically such that huge number of CP and CJs are possible.

To be able to achieve these goals, this PR addresses the following basic issues:

One added benefit of the Identity-based (instead of connection-based) authentication is that customers can build their own connectivity mechanism any way they want, as long as the CP can reach the SP.

Note that this PR does not address the issue of how to manage the relay system, and how to assign a CP to a relay dynamically. These issues will be addressed in later PRs.

This PR's solution is backward compatible - no relays are required.

Types of changes

yanchengnv commented 1 month ago

/build