eclipse-arrowhead / core-java-spring

Eclipse Public License 2.0
28 stars 51 forks source link

Not binding late enough (i.e. where is orchestration pull?) #269

Open emanuelpalm opened 4 years ago

emanuelpalm commented 4 years ago

A system A depends on services X, Y, Z being registered and reachable, as well as having authorization and orchestration rules in place allowing A to consume them, or A will not function properly (e.g. either spam its log with error messages or shutdown).

Let us assume that A registers with a service registry in a cloud where only Y and Z are registered and reachable, and no authorization or orchestration rules have yet been setup for A. The situation could be of spinning up a new cloud, having a service that is started only when need for it arises, or having systems come and go intermittently (e.g. small electric transportation vehicles entering and exiting an on-loading bay).

The problem is that there is nothing preventing A from trying to consume X, Y and Z before they are running and all access control rules are in place. The reason for this is that orchestration is a pull operation and not a push operation. While there used to be an orchestration push service in Arrowhead 3.0, I cannot find any such thing in the 4.2.0 documentation. Having someone push orchestration rules to you should be the normal, not the other way around. Consider this example of going to the dentist:

(1) You enter the dental clinic, walk up to a service desk to register yourself. Your ID is matched against your appointment, and any missing data about you is queried by the service assistant. (2) You stay in a designated waiting area until a dentist is ready to meet you. (3) You are called, when the dentist is ready, to enter his or her room.

If thinking about it, similar procedures are used almost ubiquitously. Sometimes the registration and waiting are combined by standing in a queue, or by picking up a queue number from a small dispenser and then waiting in a room. In any case, this is the way that is most commonly used when there might be a discrepancy between the arrival of the consumer and the availability of the provider. Only if either of the assumptions that the provider is always available or that efficiency is not important hold, the consumer can consult the provider at any time. While it might commonly be the case that services are always available on the Web, it will not be the case for many kinds of cyber-physical systems (which often will have to be physically available), which are what Arrowhead are meant to cater for.

While an argument could be made that it is easier to use existing libraries and tools to pull orchestration rules, the Kalix library and other client libraries now popping up means that it becomes possible to implement this procedure only once for each such library.

What is the status of orchestration push? Are there any issues, unknown to me, that need to be solved before it can be implemented?

tsvetlin commented 3 years ago

Further discussion is required to discuss the usage and implementation of Orchestration Push.

vanDeventer commented 3 years ago

Does the service consumer system (e.g., system A) need to have a server to listen to a pushed orchestration message? Should system A subscribe to service X if it is not currently available?

vanDeventer commented 3 years ago

@emanuelpalm (I really have never used this type of notation, first time)

emanuelpalm commented 3 years ago

Does the service consumer system (e.g., system A) need to have a server to listen to a pushed orchestration message? Should system A subscribe to service X if it is not currently available?

Well, either (A) a listening service is provided by the system, or (B) the Orchestrator uses some kind of connection-oriented protocol that allows for a consumer to maintain a connection until the orchestrator is ready to push the rules. I lean towards the latter, as it means that the orchestrator can know that someone expects it to push the rules when they become available.

While thinking about this, and other similar issues, I've come up with the rule that you should push notifications and pull data to minimize the room for sending data that cannot be received. I may have inadvertently stolen the concept from somewhere else. ;) Depending on how much data the Orchestrator end up having to push, it may be relevant to only notify the consumer about its availability and then have it request it as needed.

MaGaMeGa commented 3 years ago

@Listening service vs connection oriented protocol

As far as I'm aware of, in the arrowhead framework the consumers are some of the least resource intensive entities. The only resources they must have is the ones that needed to send a request. (to be exact send 3 : one to serviceregistry, one to the orchestrator and one to the provider) To make it mandatory to implement a listening (notification) service and open the ports for it, may exceed the available resources, or security constraints of some consumers.

On the other hand keeping active connection to each consumer would require additional resources from the orchestrator, which is already the most task loaded coresystem. I'm not sure if there is a way to use active connections and avoid scalability issues.

As I see:

Since there may be use cases where the consumer could not implement listening and the number of consumers in the local cloud is high, I'm really not sure if the push orchestration should be mandatory in every arrowhead cloud. However I do agree that most of the scenarios, could benefit from a "use case optimized" push orchestration option.

emanuelpalm commented 3 years ago

@MaGaMeGa You don't have to maintain actual TCP connections to all consumers. I had some kind of pseudo-connection-oriented protocol built on top of UDP in mind as an alternative when writing the above (which is why I didn't explicitly mention TCP). I didn't bother to check whether a relevant such exists, though. Looking around now, I found the CoAP observe option, which functions pretty much as I had in mind. It does not involve KEEPALIVE messaging, which I guess could be a good thing. CoAP is designed for constrained devices, so it should suit your needs.

Regarding the burden on the orchestrator. A well-provisioned server with optimized software should be able to maintain millions of concurrent TCP connections. In this post from 2013, they were able to maintain 12 million concurrent TCP connections using a commodity Dell server from the time. Remember, however, maintaining connections is not the same as actively using them. ;-) I've seen similar stories before. It should be very possible to make our Orchestrator able to perform well enough to maintain at least one million concurrent TCP connections (or CoAP observations) without severe performance problems. It may require ditching spring, however, as it's performance is far from exceptional. Before we start to argue about performance, however, we need to establish performance goals for various use case settings. My point is merely that there is plenty of room for performance, and if we decide that a given local cloud will never contain more than 1'000'000 systems, there is room for handling their orchestration via a single device.

MaGaMeGa commented 3 years ago

@emanuelpalm
It is very good to see that my worries are not necessary, there are viable technical solutions for both listener and the connection oriented approach. As I understood the arrowhead framework, it should not be bind to concrete implementations, but offer uniformly documented/usable solutions to many use cases.
I could not agree more that use cases should come with performance goals, but I think performance goals by itself should never be a single decision divider. I think it should come in hand in hand with tested hours, safety, security, reliability, available documentation and maybe formal verifiability, etc. goals.

But back to your dental clinic example: I think it is invaluable to have such examples in the arrowhead discussions. Thank you for setting an other good example how to think about arrowhead issues!

Please, let me know if I could read it as bellow:

Scenario A) -
Given there is a dental clinic (arrowhead cloud)
And you (system A) walk up to a service desk to register yourself.(request register system service ?)
And any missing data about you is queried by the service assistant.(here would you specify which service you are looking for? )
And you stay in a designated waiting area until a dentist is ready to meet you,
When the dentist is ready you are called (push orchestration)
Then you can enter his or her room (actual service call)

If it is so, then it did helped me to understand that push orchestration is not something that a consumer get a service endpoint location without
requesting it first. (What troubled me most in thinking about push orchestration was the idea
that a system get push orchestrated a service but it may not have the implementations to handle the service response)

A similar scenario points out some other issues:

Scenario B)
Given there is an insurance company (arrowhead cloud)
And me (system A) has an account (systemCertificate),
And I know by the insurance policy that I am eligible for dental service if I have a teeth issue,
And I have to request an appointment (service X),
And with my appointment I could get a dental service(Y),
When I have a teeth issue,
Then I call the consumer service (orchestrator) for a dental service,
And the consumer service present a list of dental clinics contacts (provider-service-address-interface) which are available at the moment with my policy,
And I call the first dental clinics (actual service request) for appointment, but there is no answer (response with error message)
And I message to the second clinic (actual service request) and I got an appointment
And I go to the dental clinic to get my teeth fixed.

If you just signed a contract with the insurance company and you already have a toothache, then it may be quite understandable that the insurance company could request appointments and arrange dental services, but it is not available for you until all authorisation for your contract is in place. I do not think that this is a case where additional notification is needed, I think the signed contract should clearly state the effective starting date of service requests acceptance.

However I do think that both of the above scenarios enlighten the fact that the service scenarios we tend to think about are sequential processes and not readily available static chanks of data.
What is you opinion about the enhancement of the service description with the sequence of events that a service must/may publish? What is you opinion about even incorporate the condition descriptions under the certen eventType must be published?

emanuelpalm commented 3 years ago

If I understand your A and B scenarios correctly, the difference is that in A the service provider is not assumed to be immediately available, while in B it is assumed to be immediately available. Right? As far as I can tell, that is what determines whether orchestration push (A) or pull (B) is most suitable.

What is you opinion about the enhancement of the service description with the sequence of events that a service must/may publish?

What do you mean by event? Like an EventHandler event? Could you show an example of such a sequence of events?

What is you opinion about even incorporate the condition descriptions under the certen eventType must be published?

I'm not certain I understand your question. Could you elaborate?