Proposed New Feature: Support "Request Fan-out" Scenario

faa-swim / sds20

SWIM Discovery Service Discussions

0 stars 0 forks source link

Proposed New Feature: Support "Request Fan-out" Scenario #1

Open wznira opened 2 years ago

wznira commented 2 years ago

Augment or replace the existing request sequence solution with an approach that would allow a requester to simultaneously send requests to all known SDS. Using a requesting instance to combine multiple responses should improve the overall response time for users. For example, in response to a client query request, a discovery service X could forward requests to two other discovery Y and Z and return combined results from all three services (X, Y, Z) to the user.

mkaplun commented 2 years ago

This picture should help understand the difference between the "Request Chain" scenario described in the specification, and the proposed "Request Fan-out." The advantage of the "Fan-out" approach, the aggregation of peer' responses is implemented only once. While in the "Chain" scenario, the aggregation is implemented n-1 times, increasing overall response time and computational cost.

zeroreloaded commented 2 years ago

What about following case.(sorry for poor quality but more realistic)

alt

I can't find any difference between Chain and Fan-out. In my opinion there needs some rules in scenario.

The first requester should have responsibility for aggregation
Other SDSs except for requester only have responsibility for providing their own information.

So, I still believe that chain request/response is not a good architecture and there should be a mechanism to find other SDS like /peers as I suggested.

Thanks.

TaeYoung Shin from KAC(tyshin0@airport.co.kr)

mkaplun commented 2 years ago

@zeroreloaded wrote:

I can't find any difference between Chain and Fan-out

The difference between "chain" and "fan-out" is that the former is a sequence of one-to-one interactions where each node (except the last one) is responsible for aggregating the responses before returning them to the requester. In contrast, the latter is a simultaneous one-to-many interaction, where only the initial requester performs integration of all received responses.

This vision appears consistent with the rules proposed in @zeroreloaded response

It is also correct that the originator of the request should first obtain information about all available SDSs before "fanning-out" the request. And therefore this operation possibly can be preceded by some combinations of GetPeers and GetDiscoveryService operations, which can also be fanned-out.

luxdlzu commented 2 years ago

For the "Request Chain", if "X" knows more than one other peers, which one should be selected? In addition, with the number of peers increasing, the response time will increase. For "Request Fan-out", each peer should know all other peers that will increase the maintains cost.

As the operational requirements are different at local, regional and global levels, maybe the federated architecture or hierarchy architecture is more efficient for SWIM service discovery. Moreover, as the International Aviation Trust Framework (IATF) has been considered for SWIM based information exchange, it is better for us to consider how to use this framework for the requester's identification and authentication.

TianYungang commented 2 years ago

For the "Request Chain", if the peer Z knows two available peers X and M, see figure below, it’s difficult to ensure that Z selects M instead of X.

Hence, in the scenario of "Request Fan-out", when other peers only response for their own information to the requester, the redundant requests and responses in above figure could be avoid. The details of our opinion are shown as the figure below.

In this scenario, how to maintain and update the peer list owned by X may be the topic we need to discuss in the next step.

mkaplun commented 2 years ago

RE: comment by @luxdlzu At the time when the "Request Chain" scenario was defined in SDS spec, there were only two instances of SDS (KAC and FAA), with some possibilities that one of them may know about some other SDS/registry implementation.

Now the situation has changed. Every participant in SDS virtual network is aware of all other SDS/registries. The list of all instances of SDS can be compiled by each peer by sending (fanning-out) GetPeers requests and storing the result locally (caching it). After that, each peer can send requests (e.g., GetServices) to all other peers as needed.

Such an approach, together with the approach discussed in response to the comment by @zeroreloaded (single aggregation vs. consecutive aggregations), should decrease the overall cost and improve the performance of an SDS virtual network.

Regarding:

the operational requirements are different at local, regional and global levels, maybe the federated architecture or hierarchy architecture is more efficient for SWIM service discovery

, more specific suggestions on this matter will be appreciated.

Also agree that

it is better for us to consider how to use IATF for the requester's identification and authentication

Apparently, more specifics are necessary for implementing it in the context of the SDS network. Suggest creating (and managing) this as a separate issue. Alternatively, it can be discussed in conjunction with issue faa-swim/sds20#3.

mkaplun commented 2 years ago

Following on the comment by @TianYungang, and the comments by @luxdlzu, and @zeroreloaded, we conclude that the "Request Fan-out" approach is preferable to "Request Chain." Therefore, in the specification, the "Request Chain" will be deprecated or entirely replaced by the "Request Fan-out" scenario. All members of this GitHub area are invited to comment with either consent or disagreement (including justification). Please feel free to submit any specific suggestions for changes to the specification.

mkaplun commented 2 years ago

@wznira wrote in an internal email

I was thinking about the Fan Out feature as discussed in Proposed New Feature: Support "Request Fan-out" Scenario. I want to propose that we make this operation asynchronous because the discovery service receiving the initial request will have to get results from each discovery service it knows. This is likely to result in delay in returning the consolidated service list to the client.

The diagram below is what I am thinking -

We will have a new endpoint /sds/fan-out-services. When discovery service X receives the request, it immediately returns a 202 (Accepted) HTTP response with a token for the client to retrieve the results later. This is an example of what the initial 202 response could be like -- HTTP Status 202 (Accepted) (restfulapi.net). The client then retrieves the consolidated list from Discovery Service X.

I would like to offer a sequence diagram that reflects my understanding of "fan-out". It appears to me that asynchronious operation doesn't add anything to the result while making the implementation unnecessary harder. SDS sequeunce fan-out simple2 The diagram shows a very straightforward realization of REST service operation simply repeated in sequential or parallel order as many times as there are participants. And if we start deploying cache (which should be very effective considering the rarely-changing nature of the data) the cost of asynchronious will be very hard to justify.

swang-nira commented 2 years ago

Excellent sequence diagrams, Mark and Wen.

At the same time, Mark, I have several questions and we can discuss tomorrow afternoon.

Your sequence diagram seems more like a request chain, instead of fan-out. If I understand correctly, for fan-out, the discovery service will send request to all the peer services at the same time, to save time and make the overall system more reliable.
Could you confirm U,K,C,J. If I understand correctly, U is same as X (discovery service X) in Wen's sequence diagram. K,C,J are the peers' services. K is U's peer. C is K's peer and J is C's peer.
If that's the case, the service U (or X) should have the peers information. Instead of the User. Otherwise, the use can directly call each and individual peer service.
As for the asynchronized API call, the main purpose and motivation is to avoid time-out. Especially for the case demoed in your sequence diagram, after the user sent request to U, it may take a long time to get back the result. If not using asynchronized API call, the user request will get time out.

luxdlzu commented 2 years ago

For my understanding, to achieve request fan-out, it is necessary to get the information of all peers at first. There are two approaches. One is to ask your known peers. The other is to ask someone or a broker who knows all peers. I think in your proposal, the "X" acts as a broker in the community. However, how to manage the information of all peers is another topic that we should consider.

mkaplun commented 2 years ago

Due to the number of discussions (online and offline) generated by this comment, we feel that further clarification is required. The second diagram presented in the comment consists of two use cases that can be reviewed separately. These are 1) Locate all known SDSs (peers) and 2) Retrieve all available services from all known peers. We will examine each separately to simplify understanding.

UC1 Locate all known SDSs (peers)

Actors: User - a person or a process that initiates a request. U, K, C, J - instances of an SDS. Precondition: U is aware of and capable of connecting with K. U is unaware that either C or J operates SDSs. Scenario: 1) User requests U to generate a list of all known instances of SDS. 2) The only peer on U's list at this point is K. 3) U sends a request to K for peers known to K. 4) K is aware only of C and responds to U with information about C. 5) U adds C to its list. (At this point, the U's list has two addresses: K and C.) 6) U sends a request to C for peers known to C. 7) C is aware only of J and responds to U with information about J. 8) U adds J to its list. (At this point, the U's list has three addresses: K, C, and J.) 9) U sends a request to J for peers known to J. 10) J is unaware of any other peers and responds to U with "no known peers." 11) U returns the list with all known peers (K, C, and J) to the User. Note: 1) This scenario is a "happy day scenario." If in Step 3, the K is not available (e.g., times out), the whole process fails and should be repeated later. 2) Using a cache for such rarely changing information as a list of known SDSs may significantly reduce overall response time.

UC2 Retrieving all available services from all known peers.

Actors: The same as in UC1 Precondition: U has the list generated in UC 1, which contains addresses of K, C, and J. Scenario: 1) User requests U to provide a combined list of all services from all known peers. 2) U using the list generated in UC 1 request list of services from every instance of SDS on the list. All calls are synchronous and executed in parallel. Note: the diagram uses a standard formalism ( a "par" fragment) for representing activities running in parallel and communicating and synchronizing. 3) U collects responses from K, C, and J. 4) U appends the list of its own services with the lists received from the peers. 5) U returns the combined list to the User. Note: Both Notes in UC1 are also applicable to UC2. If one of the peers is unavailable (e.g., down or slow response), it will not prevent U from providing the list of services, although with less information than in the "happy day scenario." Caching by every SDS the respective list of services may mitigate such issues.

luxdlzu commented 2 years ago

Thank you for your clarification, Mark.

In the UC1, the Request Chain is applied to get the list of all peers. I think the following points should be clarified or considered to complete the task of this step.

The assumption is that the user can get the list of all SDSs from any peer. If this assumption is true, the J should responds its known peer (C, K, or U) to U. If not, it cannot ensure that a user can get the list of all SDSs from J. However, how to ensure that any peer can get the list of all SDSs should be considered.
In this scenario, U gets the list of all peers at the UC1. As you said, if this information can be cached, it will reduce the cost of following requests. Moreover, if this information (the list of all SDSs) can be responded to the request of GetPeers(U) from other peers, or shared with other peers, it will be more efficient. However, how to assure the living status of peers in the list should be considered.
Furthermore, we should consider how to improve the response time and availability for SDS request through system level. Maybe the federated or hierarchical architecture is a possible option.

mkaplun commented 2 years ago

RE: the comment posted by @luxdlzu

The assumption is that the user can get the list of all SDSs from any peer. If this assumption is true, the J should responds its known peer (C, K, or U) to U. If not, it cannot ensure that a user can get the list of all SDSs from J. However, how to ensure that any peer can get the list of all SDSs should be considered.

The primary notion behind the UC1 is to demonstrate a discovery of network nodes in a non-centric environment. Initial discussions about service registries in ICAO SWIM environments (both IMP and APAC) started with ideas of developing SWIM registries at global or regional level; i.e., a central point of information and management for all local (state) registries. For various reasons, this idea has never worked out. As we started working on SDS's architecture, we realized that we needed to deploy a decentralized model, which would allow every participant to find out about all peers by "asking" other peers and, and in turn, provide similar information to other peers when asked. We understand that this approach also has its limitations, correctly pointed out by @luxdlzu.
We are open to the idea of creating a new issue in GitHub to expand further the notion of getting a list of all peers in an SDS network.

In this scenario, U gets the list of all peers at the UC1. As you said, if this information can be cached, it will reduce the cost of following requests. Moreover, if this information (the list of all SDSs) can be responded to the request of GetPeers(U) from other peers, or shared with other peers, it will be more efficient. However, how to assure the living status of peers in the list should be considered.

The goal of UC1 is rather to illustrate the concept of a decentralized list of peers. In reality, all participants of today's SDS network have the same list of all known peers (four participating states). Agree that the approach described in UC1 can and should be improved. We invite all participants to contribute to a new and improved design GetPeers (or a similar) operation. All suggestions, preferably with diagrams and detailed descriptions, may be posted in the GitHub space.

Furthermore, we should consider how to improve the response time and availability for SDS request through system level.

All SDSs are developed, hosted and managed independently from each (an important advantage of SDS). Therefore every instance of SDS has no control over the qualities of service exhibited by a peer. For example, FAA cannot and should not control the performance of the KAC server/service. However, it will be beneficial for all participants to discuss best practices and recommendations for improving SDS performance. Again, every practitioner is welcome to contribute to this topic.

Maybe the federated or hierarchical architecture is a possible option.

Before discussing this suggestion further, please clarify the concepts of "federated or hierarchical architecture" in this particular discourse.

swang-nira commented 2 years ago

Based on above discussion, can we make below assumptions:

No service provider knows all other peers. It only knows its own peers.
All service providers are equal. This is same as Mark mentioned about decentralization.

From above, I am thinking this issue is pretty aligning with a classic computer science algorithm: graph traversal. From a given graph node, how to traversing all the graph nodes. There are two approaches:

A depth-first search (DFS) is an algorithm for traversing a finite graph. DFS visits the child vertices before visiting the sibling vertices; that is, it traverses the depth of any particular path before exploring its breadth. A stack is generally used when implementing the algorithm.

A breadth-first search (BFS) is another technique for traversing a finite graph. BFS visits the sibling vertices before visiting the child vertices, and a queue is used in the search process.

Below is a simple digram to demo those two approaches. DFS-VS-BFS

luxdlzu commented 2 years ago

Thank you for your confirmation and comment.

As SWIM is not a fully decentralized system, it is a loose coupling community. How to construct this community should be defined by the SWIM Governance. From this viewpoint, it is possible to divide the GetPeers process into Join and Share processes. The main idea and steps for getting the list of all peers are as follows.

At first, A constructs its own SWIM Registry.
Then, B constructs the local SWIM Registry and sends Join(B) request to A for joining the community.
A accepts and responds Share(A,B) to B for sharing the peer list in the community.
When C constructs the local SWIM Registry, it can send Join(C) request to A (or B) for joining.
Then, A (or B) accepts and submits Share(A,B,C) to C and B (or C and A) for updating the peer list.
Same as previous steps, D can ask any peer in the community for joining and trigger the peer list update.

These two process can ensure that every peer in the community has the updated peer list of the community. And this peer list can be used for GetServices request to get the list of services from other peers in the community. Moreover, according to this process, the peer list is maintained by the community of SWIM Registry that can be separated from the SDS requested by users.

mkaplun commented 2 years ago

RE: the comment posted by @luxdlzu It would be helpful if we could discuss the premise first to avoid potential confusion.

As SWIM is not a fully decentralized system, it is a loose coupling community. How to construct this community should be defined by the SWIM Governance.

It is unclear why the discussion shifts from SDS interface specification to SWIM architecture. Nevertheless, the SDS is a decentralized system. Definition: "a decentralized system is an interconnected information system where no single entity is the sole authority."

In addition, if we ignore for a moment inter-organizational relationships (and A, B, and C appear to be organizations; they develop registries, grant membership to communities, etc.), we notice that the output of the described process is no different than that of UC1 (the list with A, B, and C). (If we look at Step 5 and consider C as sn initial requester (like U in UC2), we see that C received the same list - A, B, and C.)

In the "Join and Share" scenario, the list is owned by a member of the "community" and handed off to a new member; in the UC1 scenario, the list is dynamically generated as needed.

When C constructs the local SWIM Registry, it can send Join(C) request to A (or B) for joining.

Then, A (or B) accepts and submits Share(A, B, C) to C and B (or C and A) for updating the peer list.

However, if C sent a Join request to B (see steps 4 and 5), how would A know that C joined the community? How do the lists get synchronized, and who will be responsible for this process?

swang-nira commented 2 years ago

Per Mark's questions:

"However, if C sent a Join request to B (see steps 4 and 5), how would A know that C joined the community? How do the lists get synchronized, and who will be responsible for this process?"

For a decentralized system, each component/node should keep a list of its own peers, including direct peers and indirect peers. That is the component's responsibility to do the sync-up, exploring new peers, deleting the dead peers, etc. The real tech challenging is how often, how far to send out the discovery service request. If too often, it will create lots of requests that will consume the service provider's computing resources and also the network band. Another thing is how to avoid circle request, like this: A --> B --> C --> A.

With above being said, whenever a service discovery request peer list to any node, the node can decide to return the existing peers immediately, or, to do a sync-up request to update the peers list, to get the most accurate result to the service requestor.

luxdlzu commented 2 years ago

RE: the comment posted by @luxdlzu

Thank you for your confirmation, Mark.

It is unclear why the discussion shifts from SDS interface specification to SWIM architecture. Nevertheless, the SDS is a decentralized system. Definition: "a decentralized system is an interconnected information system where no single entity is the sole authority."

I just want to clarify the technical background of this idea, because the SDS is a service of SWIM.

In addition, if we ignore for a moment inter-organizational relationships (and A, B, and C appear to be organizations; they develop registries, grant membership to communities, etc.), we notice that the output of the described process is no different than that of UC1 (the list with A, B, and C). (If we look at Step 5 and consider C as sn initial requester (like U in UC2), we see that C received the same list - A, B, and C.)

In the "Join and Share" scenario, the list is owned by a member of the "community" and handed off to a new member; in the UC1 scenario, the list is dynamically generated as needed.

In UC1, it cannot ensure that a user can get the list of all SDSs from any peer. Moreover, according to the different situations, the different process or response of each peer should be considered. For example, if U knows more than one peers (K, C, J), how to send GetPeers request will affect the response time and the overhead cost.

When C constructs the local SWIM Registry, it can send Join(C) request to A (or B) for joining.

Then, A (or B) accepts and submits Share(A, B, C) to C and B (or C and A) for updating the peer list.

However, if C sent a Join request to B (see steps 4 and 5), how would A know that C joined the community? How do the lists get synchronized, and who will be responsible for this process?

The processes are same as steps 4 and 5, because A and B have the same peer list (A,B) before C joins. 4'. When C constructs the local SWIM Registry, it can send Join(C) request to B for joining. 5'. Then, B accepts and submits Share(A, B, C) to C and A for updating the peer list.

All peers use the same process to update the peer list and the peer receiving the Join request will be responsible for the updating process.

mkaplun commented 2 years ago

RE: comment posted by @luxdlzu

When C constructs the local SWIM Registry, it can send Join(C) request to A (or B) for joining. Then, A (or B) accepts and submits Share(A, B, C) to C and B (or C and A) for updating the peer list.

However, if C sent a Join request to B (see steps 4 and 5), how would A know that C joined the community?

Let me explain my point some more:

Starting state:

list(A) = (A,B)
list(B) = (A,B)

C sends request to join to B. B sends back list (A,B,C) to C. New state:

list(B) = (A,B,C)
list(C) = (A,B,C)
list(A) = (A,B)

A never received request for joining by C, it does not know about C, and accordingly has no reason to update its list.

Then this process will require some kind of notification operation to send (push) a new version of the list to all "old" members (to those who did not receive a Join request).

Would that be an improvement over the UC1 scenario?

swang-nira commented 2 years ago

RE: comment posted by @mkaplun

Thanks for the explanation, Mark. I agree with you. In fact, if we think A,B,C as three layers/leaves in the graph, like this: C --> B --> A

C only knows B but does not know A. B knows A. When C ask B's peers, B will reach out to his peers. In this case is A. So C will get the completed list (A,B,C). As we can see, this process is pretty aligning with graph traversal. As I mentioned here

Also, for a decentralized system, each component should have a list of the whole system information. In this case, when C requests to B, B should already have the peer list -- which is A. B can immediately return the result to C, or, B can send another request to all his peers to get the latest peer list to return to C.

wznira commented 2 years ago

I think discussions on this thread covers two separate but related issues:

For the SDS web service interface, we need an operation to allow the requester to ask an DS to return a consolidated list of published services maintained by itself and the DS instances that it is aware of. This is the original intend of the question.
From APAC SWIM governance perspective, how do we maintain a list of all available DS in the region? Should one or more DS take on that responsibility? We probably should have a separate discussion on this.

TianYungang commented 2 years ago

The updated UC1 applies fan-out to compile the peer list, which can effectively avoid the circle request (A --> B --> C --> A). Besides, we also agree that each peer should cache its own peer list, which can greatly improve the efficiency of the whole process. We note that current UC1 scenario only supports the maintaining process of U’s list, while other SDSs do not maintain their lists. To solve this problem, one method is to redefine a set of maintenance and management interfaces of peer list among SDSs to support the status management of peers such as joining, surviving, sharing, and exiting. Another method is to simply modify the current Getpeers interface so that other peers can grasp newly added peers in time. Based on the latter method, i.e., the simple modification of the current Getpeers interface, our idea for the peer lists maintaining is as follows: Since the current request of Getpeers(K) is directed, the requested instance K may not know the information of the requesting instance U during the whole process, which may lead to more complicated maintaining and updating process of peer list or the incomplete peer list. In this regard, we propose to modify the request interface to Getpeers(U, K), which can not only ask K for its own peers, but also inform K of the requester's information. Based on the modified Getpeers(U, K), the peer list can be compiled or updated directly when the node is newly added or restarted, so that all SDSs can dynamically update and maintain their own peer lists and cache them. This may avoid the situation that the SDS instance needs to update the peer list again when the user asks for it, and may also ensure the integrity of the peer list. The following is the sequence diagrams of two typical scenarios to illustrate our idea.

Scenario 1: New SDS instance joins

Actors： U——SDS instance, offline K——SDS instance, online J——SDS instance, online C——SDS instance, newly added/restart ,where ‘offline’ refers to that the SDS instance cannot be accessed due to network or equipment failures, ‘online’ refers to that the SDS instance can be accessed, ‘newly added’ refers to the new instance integrated into the information collaborative environment, and ‘restart’ refers to the recovery of the SDS instance to the accessible status after the fault is resolved.

Precondition： U, K, and J already have the cache peer lists（U, K, J） C only knows the information of itself and J（C, J）

Scenario：

C starts up with its peer list (C, J), and then sends a request to J for peers known to it;
If C is not in J’s list, J adds C to its list, obtaining (U, K, J, C);
J responds its list to K;
If any node in the returned list is not in C’s current list, C adds it (e.g., U,K) to its list, obtaining (U, K, J, C);
According to the updated list, C sends a request to K for peers known to it;
If C is not in K’s list, K adds C to its list, obtaining (U, K, J, C);
K responds its list to C;
If any node in the returned list is not in C’s current list, C adds it to its list. Here, all peers in the returned list are known, so C remains its list (U, K, J, C);
According to the updated list, C sends a request to U for peers known to it;
Since U is offline, there is no response.

Remark：

In this scenario, while the new SDS instance C requests for the list, other peers can decide to add the requester to their own list, thus directly realizing the sharing of node information.
In steps 9 and 10, because U is offline, C cannot get the response and update its list. However, we note that since each peer has its own cached latest list, C has received a complete list from other peers in early steps. In allusion to the problem that C is not included in U’s list in this scenario, the list will be timely updated after its next start-up. Please see Scenario 2 for details.

Scenario 2: Existing SDS instance restarts

Actors: U——SDS instance, restarted K——SDS instance, online J——SDS instance, online C——SDS instance, online

Precondition： Based on results of Scenario 1, U, K, J, and C have cached their own peer lists, i.e., U--(U, K, J), K--(U, K, J, C), C--(U, K, J, C), J--(U, K, J, C).

Scenario：

U restarts up with its peer list (U, K, J), and sends a request to K for peers known to it;
If U is not in K’s list, K adds U to its list. Here, U is already in K’s list, hence K remains its list (U, K, J, C);
K responds its list to U;
If any node in the returned list is not in U’s current list, U adds it (i.e., C) to its list, obtaining (U, K, J, C);
According to the updated list, U sends a request to J for peers known to it;
If U is not in J’s list, J adds U to its list. Here, U is already in J’s list, hence J remains its list (U, K, J, C);
J responds its list to U;
If any node in the returned list is not in U’s current list, U adds it to its list. Here, all peers in the returned list are known to U, hence U remains its list (U, K, J, C);
According to the updated list, U sends a request to C for peers known to it;
If U is not in C’s list, C adds U to its list. Here, U is already in C’s list, hence C remains its list (U, K, J, C);
C responds its list to U;
If any node in the returned list is not in U’s current list, U adds it to its list. Here, all peers in the returned list are known to U, hence U remains its list (U, K, J, C);
The user requests U to generate a list of all known instances of SDS.
Since U updated and cached its list when it restarted up, it can return the list with all known peers (U, K, J, C) to the user directly. Remarks: 1.In steps 13 and 14 of the above scenario, if quite a long time has passed since U’s last peer list update, U can choose to perform the peer list update again before returning the final list. 2.The elimination of disabled peers is not considered in the process of the peer list updating and caching in above scenarios. The proposed sequence diagrams may also be extended to the elimination of peers, but its constrains and periods need further discussion.

wznira commented 2 years ago

Thanks @TianYungang for this very interesting scenario. However, I still think we are trying to have several discussions under the same thread -

What runtime web service interface is needed to request a "fan-out" operation? For example, how does a client of FAA SMXS to request a list of published services from not only FAA but all the SDS instances it knows?
How to maintain a list of available SDS instances in the region? In @luxdlzu 's approach, one or more SDS instances will be given the responsibility of knowing all available SDS instances. In @TianYungang's approach, some sort of gossip protocol is used to distribute information about available peers to all SDS instances.

If my understanding is correct, I think we should address these questions separately.

swang-nira commented 2 years ago

RE: comment posted by @wznira

I agree with Wen about creating separate tickets for different topics, to make the discussion more focus to the related ticket. At the same time, the question "runtime web service interface" for me seems like implementation consideration, which should be worked on after the general solution architect is finalized.

As for the second question about "maintain a list of available SDS instances", with the assumption of decentralized system like SDS, it would be everyone's responsibility to maintain the completed list, instead of some specific/designated ones. As Mark mentioned, the centralized approach didn't work out. As for how to find out the available SDS instances, all above discussions (including the sequence diagram) are trying to address/solve the problem.

TianYungang commented 2 years ago

RE: comment posted by @wznira

Thank you for your confirmation, Wen. I also agree with you to address these two questions separately. For the 1st topic, I think that we all reach a consensus on the ‘Request Fan-out’ scenario, which SDS instance should request service lists from all known peers in parallel. For the 2nd topic, I agree with creating a separate issue for further discussion.

mkaplun commented 2 years ago

It seems that there are some misconceptions in understanding the UC1 scenario. It may be due to the intrinsic limitations of a sequence diagram or simply because the author failed to present the case clearly, but in any case, I hope some additional explanations will help.

In the comment by @TianYungang, it says:

We note that current UC1 scenario only supports the maintaining process of U's list, while other SDSs do not maintain their lists.

There are also other comments that suggest to discuss who and how is responsible for maintaining a list of all SDSs in the network.

It is essential to note that the SDS specification never discusses the concept of a shared list of all services. As @wznira pointed out in his comment, the SDS is built on a concept similar to the notion of gossip protocol.

A gossip protocol or epidemic protocol is a procedure or process of computer peer-to-peer communication that is based on the way epidemics spread. Some distributed systems use peer-to-peer gossip to ensure that data is disseminated to all members of a group. Some ad-hoc networks have no central registry and the only way to spread common data is to rely on each member to pass it along to their neighbors. [emphasies are mine.]

Because of this kind of architecture, an SDS network does not have any shared list of services (essentially a substitute for a registry). No instance of SDS is responsible for creating or distributing such a list other than for its own needs.

Each SDS creates its own collection of peer' addresses while implementing the scenario described in UC1. Whether the such collection is generated dynamically by a series of GetPeer requests or is run as a batch job (and subsequently cached and regularly updated) is entirely up to the owner of an SDS.

The following diagram illustrates some of discussed points: 1) It shows where K (or a K's user) creates a request for a list of peers and eventually should receive the same list as U in the original diagram. 2) It shows a scenario suggested by @TianYungang where U is temporary or permanently offline. In this case, K obviously doesn't get a response from U and removes U from the list of its Peers. When receiving a request GetPeer from other SDSs, K's response does not include U.

Also, note that the entire process may only succeed if the initiator knows at least one other Peer capable of responding to a GetPeers request. Without this condition, the process as it described in the SDS specification cannot be implemented.

A scenario when a new SDS wants to join the network offered by @luxdlzu and @TianYungang, requires adding a new (push) operation to the SDS architecture, which seems unnecessary at this time. Adding the new SDS to the network can be done outside of API, similar to how China and Japan joined the SDS effort after an introductory meeting.

TianYungang commented 2 years ago

RE: the comment posted by @mkaplun

Thanks for your comment, Mark.

I think we still have some questions to check with you. Our understanding about the maintenance of peer list is shows as follows:

An SDS is responsible for maintaining its known peer list, but not the service list of other SDSs. Is it correct？
We want to make sure whether the maintenance method of peer list needs to be described in SDS specification. Or is it entirely up to the SDS instance itself?

TianYungang commented 2 years ago

RE: the comment posted by @mkaplun

Thanks for your comment, Mark.

I think we still have some questions to check with you. Our understanding about the maintenance of peer list is shows as follows:

An SDS is responsible for maintaining its known peer list, but not the service list of other SDSs. Is it correct？
We want to make sure whether the maintenance method of peer list needs to be described in SDS specification. Or is it entirely up to the SDS instance itself?

mkaplun commented 2 years ago

RE: the comment posted by @TianYungang We seem to be on the same page, but I would like to suggest an analogy that might add to our understanding.

I think we all have cell phones. Every cell phone has a Contacts app, which is essentially a list of " known peers."
The phone (or rather its owner) manages its own list, i.e., it decides what contacts to add or remove. The "contact list" is an abstraction. The physical representation alwas vary depending on the phone OS, model, app, etc. I hope this little analogy demonstrates my view of the "list" we discussed in other comments.

Through a series of GetPeers calls, the SDS provider creates a "contact list" and subsequently manages it as he sees fit. The only required part is that any SDS must respond to a GetPeers request with a collection of "known peers" in a format prescribed in the SDS Implementation Specification.

So to answer questions in @TianYungang comment:

An SDS is responsible for maintaining its known peer list, but not the service list of other SDSs. Is it correct？

The answer is "Yes."

We want to make sure whether the maintenance method of peer list needs to be described in SDS specification. Or is it entirely up to the SDS instance itself?

Yes, it is entirely up to the SDS instance how to maintain its peer list. The format of the list (or, more precisely, the format of a response to a GetPeer request) should be described in the SDS specification.

swang-nira commented 2 years ago

RE: the comment posted by @mkaplun

I agree with Mark. At the same time, let me try to explain my thinking about SDS maintaining the peer list. I would think each SDS will try to maintain the whole peers in the network. In that case, all the SDS will have the same peer list with the latest and updated information. However, in the real world, this is almost a mission-impossible task. There are many factors that would affect each SDS peer list, for example, the dynamic up and down of SDS, the network issue when sending out discovery queries, etc. The more realistic situation would be each SDS only have partial peer list. Whenever the user request the peer list, the only reason for the SDS to send out a new discovery request is to try to get the most updated information. There could be other mechanisms to keep the peer list updated. For example, each SDS can schedule the discovery request periodically, to get the updated peer list. In that case, whenever the SDS gets a request from the end user, it may not need to send the discovery request and can only return back the existing peer list, to avoid the time out issue. Of course, the challenging is how to avoid too many and too often discovery requests. We can discuss that in more details at the implementation level.

wznira commented 2 years ago

UC2 Retrieving all available services from all known peers.

Actors: The same as in UC1 Precondition: U has the list generated in UC 1, which contains addresses of K, C, and J. Scenario:

User requests U to provide a combined list of all services from all known peers.

U using the list generated in UC 1 request list of services from every instance of SDS on the list. All calls are synchronous and executed in parallel. Note: the diagram uses a standard formalism ( a "par" fragment) for representing activities running in parallel and communicating and synchronizing.

U collects responses from K, C, and J.

U appends the list of its own services with the lists received from the peers.

U returns the combined list to the User. Note: Both Notes in UC1 are also applicable to UC2. If one of the peers is unavailable (e.g., down or slow response), it will not prevent U from providing the list of services, although with less information than in the "happy day scenario." Caching by every SDS the respective list of services may mitigate such issues.

It is important to point out that, as implied by this diagram in @mkaplun's comment, fan-out only happens once (in step 1), when the user requests fan-out from U by invoking a "Get Services from All Peers" operation. When U forwards the request to K, C, J, U is invoking the regular GetServices operation which causes each of K, C, J to only return the published services that they maintain themselves. This is import to avoid the same DS being called multiple times and even "circular references". As such, we must be able to distinguish two operations -

"Get Services from All Peers" operation, i.e. the new fan-out operation we are trying to define
"Get Services" operation, i.e. the operation as currently defined in the spec.

cinglefield commented 2 years ago

Wen, Thank you for the latest entry on the Request Fan-out scenario. I understand the mechanics of the fan-out, as well as the need for 2 different operations.

mkaplun commented 2 years ago

It appears that this thread, because of so many comments that sometimes attempt to discuss some interconnecting notions (e.g., getPeers and getServices), has become so tangled up that I think it is necessary to clarify some basic concepts.

First, Operation: Operation is a set of messages related to a single Web service action [W3C]. In the SDS, each operation comprises two messages: a request, requesting a resource representation, and a response, returning the representation.

Each request is sent by a single client (SDS service) and responded to by a single SDS service. These operations are GetDiscoveryService, GetPeers, GetServices, and GetService.

Another critical ingredient of the SDS is a pattern of behavior. From architectural perspectives, the SDS is a composition.

"Composition is a result of assembling a collection of elements for a particular purpose" [ISO/IEC 18384-1].

In our case, "a collection of elements" is a collection of services, and a "particular purpose" is a service discovery. A choreography is a kind of composition that can be used to describe the SDS.

Choreography is a type of composition whose elements interact in a non-directed fashion with each autonomous part knowing and following an observable predefined pattern of behaviour for the entire (global) composition [ISO/IEC 18384-1].

In the SDS context, an operation describes the relationship between exactly two messages, whereas a behavior pattern describes the relationship between two or more services.

The discussion in this thread focuses mainly on the differences between "chain" and "fan-out" patterns. The former was described in section 2.1.2 of the specification in the use case where a service uses the GetPeers operation to obtain a list of peers. It was also assumed that the same pattern would be used to obtain a list of services. However, this thread introduces the concept of a "fan-out" pattern, arguing that it is a much more effective way to obtain a list of services.

(Admittedly, the section "Behavior Model" in the specification does not explicitly define patterns, which may also have resulted in some confusion between the notions of operation and behavioral pattern.)

It is critical to understand that, for example, getServices operations, the way it's defined now, can/should be used in both patterns. The "fan-out" diagram in the comment shows simply how the GetService operations repeated as many times as a number of services (peers) known to an original requester. Note: the requests can be sent in parallel or in any order.

Summarizing:

The operation definition does not need to be changed; the same operation signature can be used to obtain a list of services from one or four peers.
How the service handles the result of the operation -- whether to pass a request to another peer, simply return a response or aggregate other responses -- should be described as a behavioral pattern and subsequently represented in the specification.

cinglefield commented 2 years ago

Mark,

Can you please give an example of how your summary statements are implemented in the request? The current request to the GetServices operation in NSRR is https://nsrr.faa.gov/smxs/services
Upon NSRR receiving the request above, how does NSRR determine whether to a. just return the NSRR list of services, as it currently is implemented, or to b. get the list of peers, and for each peer, issue the GetServices request for that peer, aggregate the lists, and return the aggregate?

Thank you, Caroline

cinglefield commented 2 years ago

Mark responded with this explanation and diagram - thank you!

Assumptions and Preconditions:

There are three SDSs: A, B, and C. The A is the one that the User has access to.
A is aware of the existence and addresses of B and C. (Presumably from a GetPeers operation implemented earlier.)
The system that hosts A is capable of creating temporary data storage (cache, temporary table) to store a collection of GetServices responses. A request URI may look like this: http://nsrr.faa.gov/smxs/services?peer=http://A/services&peer=http://B/services&peer=http://C/service&category=weather

support request fanout

luxdlzu commented 2 years ago

There are three SDSs: A, B, and C. The A is the one that the User has access to.

A is aware of the existence and addresses of B and C. (Presumably from a GetPeers operation implemented earlier.)

The system that hosts A is capable of creating temporary data storage (cache, temporary table) to store a collection of GetServices responses. A request URI may look like this: http://nsrr.faa.gov/smxs/services?peer=http://A/services&peer=http://B/services&peer=http://C/service&category=weather

We are testing this approach, however, if any response of B or C is timeout, the user cannot get any list of services. For example, from ENRI, the user sends GetServices to both FAA and KAC. When the response of FAA is timeout, the user cannot get the service list of KAC, even it is available. As we cannot assure the availability of other peers in this loose coupling environment, we revise the approach to respond the received service list to the user one by one in asynchronous way.

mkaplun commented 2 years ago

RE: comment by @luxdlzu

If one of the requests fails, it should not prevent a requester from sending requests to other peers on a list.

If B on the diagram doesn't respond, A will return a list with only A and C services. The request-response between A and C should not depend on the result of A and B's exchange.

If B fails to respond (e.g., returning 504 Timeout error), the system just has to move to the next SDS on its list.

It is one of the advantages of the SDS that when one peer is down (or leaves the network), the rest can continue to function.

mkaplun commented 2 years ago

RE: comment by @cinglefield

Let's use the same use as shown in the diagram to provide an example of an output of "fan-out" interaction.

We will use the same query request on all three services: GetServices(URI(A), URI (B), URI(C), "weather").

Let's also think of "Cache" on the diagram as some kind of temporary table.

SDS A has three weather services which may look like this:

SDS B does not have weather services and will return 0 records. (Alternatively, we can think that B timed out and should be skipped.)

SDS C has two weather services which may look like this: http://c/services/WS1; WS1; Provides weather data; weather; operational; resource-oriented http://c/services/WS2; WS2; Provides weather data; weather; operational; resource-oriented

As a result, the user will receive a list of services like this:

http://a/services/WS1 ;WS1; Provides weather data; weather; operational; method-oriented http://a/services/WS2; WS2; Provides weather data; weather; prospective; resource-oriented http://a/services/WS3; WS3; Provides weather data; weather; retired; message-oriented http://c/services/WS1; WS1; Provides weather data; weather; operational; resource-oriented http://c/services/WS2; WS2; Provides weather data; weather; operational; resource-oriented

We use pseudo-code for this query. However, SDS queries should be constructed using common REST syntax similar to the example mentioned by @luxdlzu .

http://nsrr.faa.gov/smxs/services?peer=http://A/services&peer=http://B/services&peer=http://C/services&category=weather

(It may require some refining, but this is something to be addressed at an implementation level.)