This is the 1st PR to enhance communication scalability of NVFLARE.
Currently, SP is the comm bottleneck, because all CPs and CJs are connected to it, and it serves as the relay for all CJ/SJ communications.
This PR makes it possible to use many separate relay processes for CJ/SJ communications, and remove the need for CP and CJ to connect to the SP directly. Instead they will only need to connect to the relays.
Furthermore, relays can be organized hierarchically such that huge number of CP and CJs are possible.
To be able to achieve these goals, this PR addresses the following basic issues:
FQCN of CP. Currently we use the client name as the FQCN of the CP. This works only when CP is directly connected to the SP. Now that the CP will be directly connected to a relay, the CP will become a child of the relay, hence its FQCN is changed accordingly. Many places have to be modified to support this.
SP/CP authentication. Currently we rely on the direct SSL connection between CP and SP for mutual authentication. But since they may not be directly connected, we can no longer rely on SSL authentication. Instead we will explicitly authenticate each other through a challenge-response based protocol. This protocol explicitly validates against the Identity (i.e. the Common Name) of the SP, regardless how the connectivity is established.
One added benefit of the Identity-based (instead of connection-based) authentication is that customers can build their own connectivity mechanism any way they want, as long as the CP can reach the SP.
Note that this PR does not address the issue of how to manage the relay system, and how to assign a CP to a relay dynamically. These issues will be addressed in later PRs.
This PR's solution is backward compatible - no relays are required.
Types of changes
[x] Non-breaking change (fix or new feature that would not break existing functionality).
[ ] Breaking change (fix or new feature that would cause existing functionality to change).
[ ] New tests added to cover the changes.
[ ] Quick tests passed locally by running ./runtest.sh.
Fixes # .
Description
This is the 1st PR to enhance communication scalability of NVFLARE.
Currently, SP is the comm bottleneck, because all CPs and CJs are connected to it, and it serves as the relay for all CJ/SJ communications.
This PR makes it possible to use many separate relay processes for CJ/SJ communications, and remove the need for CP and CJ to connect to the SP directly. Instead they will only need to connect to the relays.
Furthermore, relays can be organized hierarchically such that huge number of CP and CJs are possible.
To be able to achieve these goals, this PR addresses the following basic issues:
One added benefit of the Identity-based (instead of connection-based) authentication is that customers can build their own connectivity mechanism any way they want, as long as the CP can reach the SP.
Note that this PR does not address the issue of how to manage the relay system, and how to assign a CP to a relay dynamically. These issues will be addressed in later PRs.
This PR's solution is backward compatible - no relays are required.
Types of changes
./runtest.sh
.