eclipse-arrowhead / roadmap

Eclipse Public License 2.0
5 stars 9 forks source link

Registries 5.0 Proposal #44

Closed tsvetlin closed 2 years ago

tsvetlin commented 2 years ago

Dear everyone,

I would like to propose three core systems in Arrowhead 5.0, which are redesigned and built upon comments, feedback from the community.

Please read the attached PDF and feel free to express your thoughs, concerns, ideas.

A roadmap meeting will be held on 7th March, from 15:00-16:30 (CEST) do finalize this proposal.

Registries_5_0.pdf

emanuelpalm commented 2 years ago

Great initiative to post a proposal before actually digging in!

After skimming through everything and reading through sections that popped out to me, I make the following suggestions:

  1. Do we positively know that X.509 and MAC addresses will be used to identify individual Arrowhead devices indefinitely? I don't think we dare to make that assumption. I believe every device should be associated with an arbitrary set of "identities", just like I believe systems and services ought to be. It may be the case that the initial 5.0.0 version of core-java-spring only supports these two types of identities, but the JSON messages used should reflect that other types of identities could be introduced.
  2. You use names such as created_at, updated_at and last_tried_at when naming TIMESTAMP fields in all cases but two. In each of the system_ and service_registry tables there is a field called end_of_validity. Why not follow the same pattern as the other fields and name those expires_at?
  3. Does the service_definition_id in the service_registry table have to be mandatory? Couldn't there be cases where you want to register services but you don't really care if the service registry can provide you with their full definitions?

If you want me to look at any other things in particular, please tell me where to look. Thanks again!

tsvetlin commented 2 years ago

Application Registration Process in high definition

image

tsvetlin commented 2 years ago

@emanuelpalm

Great initiative to post a proposal before actually digging in!

After skimming through everything and reading through sections that popped out to me, I make the following suggestions:

  1. Do we positively know that X.509 and MAC addresses will be used to identify individual Arrowhead devices indefinitely? I don't think we dare to make that assumption. I believe every device should be associated with an arbitrary set of "identities", just like I believe systems and services ought to be. It may be the case that the initial 5.0.0 version of core-java-spring only supports these two types of identities, but the JSON messages used should reflect that other types of identities could be introduced.
  2. You use names such as created_at, updated_at and last_tried_at when naming TIMESTAMP fields in all cases but two. In each of the system_ and service_registry tables there is a field called end_of_validity. Why not follow the same pattern as the other fields and name those expires_at?
  3. Does the service_definition_id in the service_registry table have to be mandatory? Couldn't there be cases where you want to register services but you don't really care if the service registry can provide you with their full definitions?

If you want me to look at any other things in particular, please tell me where to look. Thanks again!

  1. You have an X.509 Certificate Proposal (https://github.com/eclipse-arrowhead/documentation/blob/master/distribution/Eclipse%20Arrowhead%20X.509%20Certificate%20Profiles%20v1.0.Proposal.pdf) 👀 A Certificiate uniquely identifies individual Arrowhead Devices and System. Based on your proposal we are proposing this solution. We can't imagine a cloud without network communication, and where is network communication, there are MAC addresses. Thus a device can have multiple MAC addresses, since it can have multiple network interface cards.

  2. We can rename end_of_validity to expires_at. For a long time it was named end_of_validity. While created_at, updated_at and last_tried_at are technical fields, for internal operations. It has no real use for end users, application systems. While end_of_validity is a field designed to be seen by users, application systems.

  3. This makes no sense. A system which does not offer a service is not a provider. Could you please explain what were your original thoughts? (Maybe an example, use case?)

tsvetlin commented 2 years ago

Trail ID is now part of the proposal #15 We will look into Pinpoint, how it can be useful or not in our case in particular.

emanuelpalm commented 2 years ago
  1. You have an X.509 Certificate Proposal (https://github.com/eclipse-arrowhead/documentation/blob/master/distribution/Eclipse%20Arrowhead%20X.509%20Certificate%20Profiles%20v1.0.Proposal.pdf) eyes A Certificiate uniquely identifies individual Arrowhead Devices and System. Based on your proposal we are proposing this solution. We can't imagine a cloud without network communication, and where is network communication, there are MAC addresses. Thus a device can have multiple MAC addresses, since it can have multiple network interface cards.

I approach my review of your proposal solely from the perspective of what I think you believe about what messages are going to be sent to and from your implementations. It may be a suitable choice for your implementation to only support X.509 and MAC addresses, but the Device Registry protocol must be able to support other cryptographic standards (e.g. PGP certificates, the direct use of raw public keys, or Kerberos) and network interface identification mechanisms (e.g. the ID byte of CAN bus). I'm sorry for probably being a bit overly pedantic, but I just want to make sure there is no confusion about that Arrowhead itself is able to, at least theoretically, support any computer communication mechanisms. By sticking to X.509 and MAC in your implementation, we must not make it seem as if those two are the only that can be used with Arrowhead. Other implementations could be designed to support other combinations of standards.

  1. We can rename end_of_validity to expires_at. For a long time it was named end_of_validity. While created_at, updated_at and last_tried_at are technical fields, for internal operations. It has no real use for end users, application systems. While end_of_validity is a field designed to be seen by users, application systems.

I'm still eagerly waiting for @vanDeventer to present his proposal for the service registry operations and messages (or did the assignment change without me knowing?). What he would like the field to be called matters (to me at least). The field name either being the same in the database as it will be in the actual messages or sticking to a pattern should make it appear the least confusing (at least in my mind). Do as you please, of course.

  1. This makes no sense. A system which does not offer a service is not a provider. Could you please explain what were your original thoughts? (Maybe an example, use case?)

This is something that is not explained in any detail in the Concepts Reference, but that I'm planning for the (now delayed) Documentation Reference. There is, of course, an important distinction being made between service instances and service types (or definitions, as you name them). An instance is something tangible you can actually contact, while a type describes how you communicate with an instance adhering to it. The type is only relevant to consider if you do not already know how to interact with services that implement the type. Most systems will be hard-coded to interact properly with certain service types. To those systems, it will not matter if the service registry knows the type details (such as what operations they have). It only matters that these systems can identify the type identifiers of the instances they communicate with. Storing service type details in the service registry could perhaps be a boon to code generation, etc. That being said, I do not believe we should require that every single system registering a service should be forced to ensure the service registry knows of the exact type details of each of its services.

To be frank, I don't believe the service registry should store service type details at all. In my mind, the service registry maintains runtime information (what is running where?). Service type details is "planning time" information. That should go into something more akin to the "Plant Description" system. Runtime information tends to be "hot" (performance critical) while "planning time" information tends to be "cold" (not performance critical).

vanDeventer commented 2 years ago

@emanuelpalm Nice to see comments from you. I agree with you that @tsvetlin 's proposal is the correct approach to get what we decided in the Roadmap workgroup.

“Beauty is no quality in things themselves: It exists merely in the mind which contemplates them; and each mind perceives a different beauty.” ― David Hume, Of the Standard of Taste and Other Essays I did not perceive my responsibility as defining the messages in and out of the core systems. My interest has been to move away from a central database as I perceive it as a potential single point of failure (which is the result of my repeated struggles setting up the database when I just wanted to do something simple.) You might have perceived that I was responsible for that.

I am very much interested in what are in these messages. I do not know enough to make a good proposal and could go with the one you made earlier. https://github.com/eclipse-arrowhead/roadmap/issues/22#issuecomment-1002057239 (The students implementation of the Go implementation of the Service Registration system handles different message format https://github.com/ClaudeHallard/Arrowhead)

I agree very much with you that the name of the fields is very important. end_of_validity is a confusing name and expires_at is much better. In my current Go implementation, the Service Registry System is the one which fills that timestamp and the service provider is to renew/update its registration. (interestingly enough, there is no PUT method with /register). The duration between updates is now fixed, but I would have preferred that it is suggested by the provider system and requested by the consumer system. [ia consumer knows how stale the service registry can be for its own purpose; an if a service expires, it could trigger a future push orchestration] So, should duration be a field. My interest here is the system of systems' recovery of device and system failures.

When/Why does an application system need to know when a system or service was registered? Do these timestamps need to be in the message?

My work has not touched cybersecurity yet but the hardware I have in my office has hardware based private keys (e.g., https://www.st.com/en/secure-mcus/stsafe-a110.html) which is what I plan to use to authenticate the devices. I do not have much to say yet about the MAC address except that the router in my office pretends to be my computer so that I can have other devices...

In any case, I agree with you that we need to define the messages for all three new systems. Does the replied registration message need to be much different to the sent messages? Could it be the same but with fields updated (e.g., ID and expires_at)?

tsvetlin commented 2 years ago
  1. You have an X.509 Certificate Proposal (https://github.com/eclipse-arrowhead/documentation/blob/master/distribution/Eclipse%20Arrowhead%20X.509%20Certificate%20Profiles%20v1.0.Proposal.pdf) eyes A Certificiate uniquely identifies individual Arrowhead Devices and System. Based on your proposal we are proposing this solution. We can't imagine a cloud without network communication, and where is network communication, there are MAC addresses. Thus a device can have multiple MAC addresses, since it can have multiple network interface cards.

I approach my review of your proposal solely from the perspective of what I think you believe about what messages are going to be sent to and from your implementations. It may be a suitable choice for your implementation to only support X.509 and MAC addresses, but the Device Registry protocol must be able to support other cryptographic standards (e.g. PGP certificates, the direct use of raw public keys, or Kerberos) and network interface identification mechanisms (e.g. the ID byte of CAN bus). I'm sorry for probably being a bit overly pedantic, but I just want to make sure there is no confusion about that Arrowhead itself is able to, at least theoretically, support any computer communication mechanisms. By sticking to X.509 and MAC in your implementation, we must not make it seem as if those two are the only that can be used with Arrowhead. Other implementations could be designed to support other combinations of standards.

While Arrowhead theoretically can support every cryptographic standard, in reality they have to implemented. We have to start somewhere the implementation, therefore we propose X.509 to be the first in line to be implemented, whilst it can be followed by other standards later.

In the device table mac_address table can be changed to identities_metadata. Like in the system_ table. device_mac_address table can be changed to device_adress table, where MAC will only be a type.

  1. We can rename end_of_validity to expires_at. For a long time it was named end_of_validity. While created_at, updated_at and last_tried_at are technical fields, for internal operations. It has no real use for end users, application systems. While end_of_validity is a field designed to be seen by users, application systems.

Ok. expired_at 👍🏻

I'm still eagerly waiting for @vanDeventer to present his proposal for the service registry operations and messages (or did the assignment change without me knowing?). What he would like the field to be called matters (to me at least). The field name either being the same in the database as it will be in the actual messages or sticking to a pattern should make it appear the least confusing (at least in my mind). Do as you please, of course.

  1. This makes no sense. A system which does not offer a service is not a provider. Could you please explain what were your original thoughts? (Maybe an example, use case?)

This is something that is not explained in any detail in the Concepts Reference, but that I'm planning for the (now delayed) Documentation Reference. There is, of course, an important distinction being made between service instances and service types (or definitions, as you name them). An instance is something tangible you can actually contact, while a type describes how you communicate with an instance adhering to it. The type is only relevant to consider if you do not already know how to interact with services that implement the type. Most systems will be hard-coded to interact properly with certain service types. To those systems, it will not matter if the service registry knows the type details (such as what operations they have). It only matters that these systems can identify the type identifiers of the instances they communicate with. Storing service type details in the service registry could perhaps be a boon to code generation, etc. That being said, I do not believe we should require that every single system registering a service should be forced to ensure the service registry knows of the exact type details of each of its services.

To be frank, I don't believe the service registry should store service type details at all. In my mind, the service registry maintains runtime information (what is running where?). Service type details is "planning time" information. That should go into something more akin to the "Plant Description" system. Runtime information tends to be "hot" (performance critical) while "planning time" information tends to be "cold" (not performance critical).

If everything is in the documents, then, why use Service Discovery, why Orchestrate??? You can just hardcode everything. These informations must be documented, but also must be provided for the Service Registry. It is also a great check for applications to validate the operations.

Authorization most likely will be using NGAC, and it will support fine grained access, to the operation level. For this reason, the operations must be available in the service registry too.

vanDeventer commented 2 years ago

Two comments on the PDF.

  1. I am missing a simple sentence for the purpose or mission of each of the three registry systems. For example, the Service Registry system keeps track of all the currently available services of the local cloud for the Orchestrator system.
  2. I do not see why a switch to a new version of the implementation is not backward compatible, at least for a given set of versions (e.g., 4.4 to 5.2). The version of the payload does not have to reflect the implementation version of a C++ implementation. It can follow the implementation version so that is easier to follow the logic. The version of the payload should indicate what information and its structure is being transferred. If the ServiceRegistry v5 gets a payload with a version 4.4, it could do the job of communicating with the other registry system. [I did not check but heard rumors that the latest version of the Service Registry had a modification to the REST DELETE method where the details were moved from the header to the body with no backwards compatibility, which is strange if true. For a while, both solution should be handled.]

If there is telco on the topic this week, I somehow missed the invitation.

emanuelpalm commented 2 years ago

@vanDeventer

Thank you for clarifying your position. It was plenty of news to me.

I did not perceive my responsibility as defining the messages in and out of the core systems.

Alright, since we haven't decided what these messages should look like, and no one seems to be ultimately responsible, then I propose that @tsvetlin (or someone else at AITIA) adds those messages to his proposal. I'm personally much more interested in the messages than the implementation, even though the implementation is certainly not uninteresting. Please look at my proposal for a service registration message https://github.com/eclipse-arrowhead/roadmap/issues/22#issuecomment-1002057239 Jan mentioned, which I think is something that is approaching a good message layout. I haven't really thought of how to design the device and system registration messages yet, so feel free to try out whatever layouts you think could work.

@tsvetlin

While Arrowhead theoretically can support every cryptographic standard, in reality they have to implemented. We have to start somewhere the implementation, therefore we propose X.509 to be the first in line to be implemented, whilst it can be followed by other standards later.

Yes, exactly!

In the device table mac_address table can be changed to identitiesmetadata. Like in the system table. device_mac_address table can be changed to device_adress table, where MAC will only be a type.

I'm sorry for being confusing, but as I said already (but perhaps being a bit unclear), my concern is about messages, not about database tables. You may decide to have the table fields however you want. Be aware, however, that you may be asked to support other standards in the future, such as OpenPGP (even though it may be a bit unlikely, as OpenPGP certificates were deprecated in TLS version 1.3 (look towards the end of Section 4.4.2)). MAC addresses, X.509 and TLS are likely to remain the dominant standards for years to come, but there are on-going attempts to replace them (e.g. Information-Centric Networking). It may be more sensible to hard-code support for TLS/X.509/MAC now, and change the table if other contenders ever become relevant. I like the way how the table changes communicate to others working on the core-java-spring systems that support for other standards may be added.

If everything is in the documents, then, why use Service Discovery, why Orchestrate??? You can just hardcode everything. These informations must be documented, but also must be provided for the Service Registry. It is also a great check for applications to validate the operations.

Only service types are described in these documents. Not the service instances. We must still depend on Service Discovery and Orchestration to find instances of the service types we want our systems to communicate with. Finding a service of the correct type only requires the service to be associated with exactly one service type identifier.

If we want to dynamically analyze services at runtime, then information about the operations becomes a requirement, of course. Automatic service validation, which you mention, may be a valid use-case for that. I don't think it should be a mandatory part of the Service Registry system, however. Not everyone will want such validation, and I believe there is going to have to be plenty of room to configure such a validation system. I still believe that information about operations should not go into the service registry, but into another system. I understand that I don't always get what I want, however. ;-)

Authorization most likely will be using NGAC, and it will support fine grained access, to the operation level. For this reason, the operations must be available in the service registry too.

It is true that NGAC will provide fine-grained orchestration rules. It is also true that automatic validation of NGAC rules may require detailed information about service operations. It is not true, however, that the service registry must store that information for the validation to be possible (another system could store it). Neither is it a matter of-course that everyone will want to perform such validation. The NGAC rules themselves will be executed on every system providing services, not by the service registry or authorization system (other than perhaps to regulate access to their own services). They do not have to be validated before being used.

Also, it may not be relevant to validate all rules or all operations, even though some of them are. This is why I don't like you enforcing that every system registers the data types and operations of its own services.

jerkerdelsing commented 2 years ago

"To allow that the databases can be handled by same database server" How should we view this. Every system is responsible fro its own data!

Metadata

The echo service

Register-device

Register-service "Please note that SR does not check the existense of the system during the operation but marks the entry as not active. An asynchronous task will handle the system check (using WS call) and updating the flag if necessary."

Two staged startup

name (mandatory): name of the device. Text. Maximum length is 63 characters, only contains letters (english alphabet), numbers and dash (-), and must start with a letter (also cannot end with dash). Must be unique.

mac_address (mandatory): MAC address assigned to the device in text format:12 digits hexadecimal number with colon every two digits (an octet). Must be unique.

DeviceRegistry device table

GENERAL COMMENT ON THE TABLE

UlfSlunga-Sinetiq commented 2 years ago

In order to fully achieve loose coupling between core services (making them replaceable by e.g. not sharing db's), we see a need for a new field, to be added to both the request and the response:

serviceInstanceId (a string uniquely identifying the service instance)

to be added to: /serviceregistry/register /serviceregistry/query /authorization/intracloud/check /authorization/intercloud/check

To be used e.g. in /authorization/intracloud/check identifying the service instance to be checked.

CristinaPaniagua commented 2 years ago

Hi, I would like to raise my concern about the interface_protocol table and associated fields and tables. The new version of these parameters is more complex than previous versions but I personally consider that it is still inefficient and probably badly located.

Every system (provider or consumer) needs an interface or interface to communicate with other systems. Therefore, the interface_protocol parameters should be part of the System registry, not the Service Registry. If we keep the interface details attached to services we are limiting the interface information to providers and neglecting consumers.

In addition to this comment, I consider that the information added in the metadata field is probably not enough. If we consider other interface description languages ( for example OpenAPI or WADL) the data is not structured in pairs but in more complex files. Which I think, maybe I am wrong, are not supported in this version either.

Thank you for your work in creating the document and proposal.

emanuelpalm commented 2 years ago

@UlfSlunga-Sinetiq As far as I know, we will add unique service instance identifiers to the messages of relevance in the v5.0.0 release. You are not the only one who have raised this concern.

@CristinaPaniagua @tsvetlin I would like to have some critera for what information should or should not go into a given system. Otherwise we risk them holding way too much of it? I will try to think about a suitable criteria. How about them only holding runtime information? What exists in the cloud and not what those things are.

DavidRutqvist commented 2 years ago

Hello, I would like to add some things to the discussion both on a general level and in detail.

  1. What is the overall purpose of this document? To me, Arrowhead is more of an architecture with strict definition on the interfaces rather than one specific implementation. This document is a very nice implementation specification but labelling this as the 5.0 release would limit the possibility for other parties to create other implementations on one or more services.

  2. A general "meta" suggestion about the issue itself (goes for several issues so not only this one). It would have been better to make this proposal as a PR with MD, LaTeX or similar. That way we could have commented directly in the file.

  3. Regarding service registry and service definitions. I think in general, a service should be as limited as possible in scope to do one thing well but still be able to execute its responsibility on its own. One example of where this goes a bit wrong is the service registry here. I agree with what @emanuelpalm is saying about that the service definitions should not be part of the runtime information. The purpose of a service registry is to register and do discovery (lookup) of services, nothing more. While the validation of service contracts is a valid use-case it is not part of the purpose of the service registry. With that said, it could still be part of this implementation's service registry system but then exposing two services (service registry and runtime validation).

  4. Continuing on 3, I see a coupling between the service registry and system registry even though the databases are separate. Neither of these services are able to carry out its responsibility on its own, e.g. you have to call both to do a lookup. I would argue that the lookup including address is part of the service registry, not the system registry. Having been outside of the Arrowhead project now for the past 3 years I don't know how the discussion has gone. However, in general there is a need to know which systems exist, what services they do provide and who owns/maintains this system. I would argue this is part of the system registry, so more on the governmental side of things than a runtime dependency.

  5. If I understood the implementation directly a system can have one or more addresses which is good. However, each address is not linked to a service? How is this meant to work? A system could, for example, provide one service as REST/HTTPS and one as MQTT. These would then have two different addresses since they use two completely separate protocols as well. How will the system/service registry handle this?

  6. Finally, I did not fully understand the purpose of the "URI Crawler"? Could you elaborate what purpose it has and how it works.

// David (Sinetiq, I used to be at BnearIT ~3 years ago for reference)

jerkerdelsing commented 2 years ago

MoM from Roadmap v5.0.0 meeting March 4.

Backward compatibility

Independence between Service Systems and Device registries

Echo or Monitoring service A Monitoring service with multiple operations among which Echo should be one shall be allowed but not mandatory.

Next meeting will be Wednesday March 16 at 15.00 ordinary Roadmap WG time-slot. Primary agenda point will be to finalise the issue #44 comments not yet addressed starting with Register-device from @jerkerdelsing

The ambition is to close all points next meeting such that development of detailed documentation and subsequent coding can start.

ajoino commented 2 years ago

Regarding multiple operations of services:

To me, the point of SOA is that each service should provide one thing and that that one thing should be limited in scope. If you want to provide more things, you should add more services (similarly to how using boolean flags in functions to change their behaviour is often considered worse than using two different functions.) When I see an HTTP service where one can use more than one method, that tells me that you want two different services. They could possibly share endpoints but they should be registered as different services, differentiated by name and metadata. Parameters in the path and objects in the body shouldn't change the behaviour of the service substantially.

From the meeting today I feel like I've misunderstood something about services, could someone clarify?

tsvetlin commented 2 years ago

@ajoino The idea was always part of the Arrowhead Framework, but was implemented in a "bad" way. In @jerkerdelsing IoT Automation book, it was described that the Service Registry, has a service called Service Discovery which offers:

functionality. These functions were named operations recently.

In 3.X, 4.X, these functions were implemented as a separate service. I believe this aproach is also viable and good.

On a roadmap meeting we agreed, that it has to implemented the "correct" way, so services will have at least one operation from now. Yes it may be confusing, but in the end nothing really changes, only there is one more abstraction layer.

ajoino commented 2 years ago

Thank you @tsvetlin!

It's just a matter of personal preference, and I don't feel very strongly about it. Both ways will do the same thing in the end anyway, it's just a matter of abstraction as you put it.

jerkerdelsing commented 2 years ago

@ajoino

Generally SOA is very permissive on these kind of questions.

A couple of things have though been defined with Arrowhead since quite some time

We currently have no upper bound of operations stated but in a micro service context the number is presumed to be rather small.

ajoino commented 2 years ago

@jerkerdelsing perhaps I'm misunderstanding exactly what we mean by 'operation'.

My worry is that by having services whose operations are based on REST methods, we will make it harder to provide services using protocols that are not RESTful. Correct me if I'm wrong, but having a single services-discovery service, where you GET, PUT/POST, and DELETE to manage services works well for HTTP and CoAP, but how would that work for a websockets or MQTT version? As far as I'm aware neither WS nor MQTT has methods like that. How would you tell the operations apart for such protocols? Maybe we could add a field in the message describing what operation to use, but I feel that the message sent must be the same no matter the protocol used. My worry is that the design we have chosen will make it difficult to design non-RESTful versions of those services.

I'm sure we have thought of this and I'm just out-of-the-loop currently, just wanted to raise my concerns.

vanDeventer commented 2 years ago

@emanuelpalm @tsvetlin I am trying to get a better understanding of what the payload to the device registry should look like. I see now that I omitted the authentication string. My first attempt results in { "id": 0, "hostname": "MBPC.local", "ipAddresses": [ "130.240.173.8", "192.168.1.2", "127.0.0.1" ], "macAddresses": [ "38:f9:d3:8c:08:8c", "e8:ea:6a:78:03:c5" ] }

I get this when I run host.go program on my Mac while being connected to the ethernet and WiFi.

vanDeventer commented 2 years ago

Since I am not agnostic, I am having issues right now.

I did not work with MQTT nor web sockets and my problem is to understand the difference between operations and services... I will start with an example: I have two motorized valves used in a heating system. I can request their current positions and request that they update their position (from 0 to 100%, with 100% meaning fully open). One valve is in the kitchen and one bathroom. What should my path look like with the different protocol? With REST, I would imagine something like: GET localhost:9023/valve/kitchen What would be more correct? How would one do that with web sockets or MQTT?

emanuelpalm commented 2 years ago

Service vs. Operation

@ajoino @vanDeventer If we make a parallel with how Object-Oriented Programming works in Java, you may think of a system as a class instance (i.e. an object), a service as a Java interface and an operation as a method. A major difference is that in Arrowhead, our "class instances" (systems) cannot have methods without declaring them as part of a "Java interface" (service).

Systems maintain internal state that can be queried and/or updated via the operations of the services that system hosts (or implements in Java lingo).

A service provides an interface (point of communication) through which a particular task can be fulfilled. If that task requires multiple messages to be properly fulfilled (e.g. maintaining an accurate entry in a service registry), then that interface will have more than one operation. By task i mean any kind of value-creating activity.

emanuelpalm commented 2 years ago

@vanDeventer An operation is something that accepts exactly one message and replies with exactly zero or one messages.

In REST, an operation would be identified by an HTTP endpoint, formulated as an HTTP Method (POST, PUT, DELETE, etc.) and a Path (e.g. PUT /service-registry/entries/my-service-id), as well as IP address, port, etc.

In MQTT, that could be a specific topic, which looks as the path in REST. A major difference, however, is that MQTT does not have or require the use of any methods (PUT, POST, etc.). Neither does it demand that the path emulates a directory structure (as does the REST spec.). The above example HTTP endpoint identifier could be formulated for MQTT as /service-registry/register/my-service-id.

WebSockets is just a streaming protocol, just as TCP, possible multiplexed with HTTP messages. You would have to define your own protocol on top of it for us to know how to formulate an Arrowhead operation in that protocol. You could, for example, send JSON-RPC or CoAP messages through WebSockets. In the former case, an operation could be a namespaced function name like serviceRegistry.register (the service ID would have to be in the message payload), while the latter would have an identifical endpoint to our REST example above.

emanuelpalm commented 2 years ago

@DavidRutqvist

  1. What is the overall purpose of this document? To me, Arrowhead is more of an architecture with strict definition on the interfaces rather than one specific implementation. This document is a very nice implementation specification but labelling this as the 5.0 release would limit the possibility for other parties to create other implementations on one or more services.

I agree that a document with API details only would be better to base this discussion on. @tsvetlin said during our last telco that such a document will come later.

  1. A general "meta" suggestion about the issue itself (goes for several issues so not only this one). It would have been better to make this proposal as a PR with MD, LaTeX or similar. That way we could have commented directly in the file.

I agree. That would have been nice.

  1. Continuing on 3, I see a coupling between the service registry and system registry even though the databases are separate. Neither of these services are able to carry out its responsibility on its own, e.g. you have to call both to do a lookup. I would argue that the lookup including address is part of the service registry, not the system registry. Having been outside of the Arrowhead project now for the past 3 years I don't know how the discussion has gone. However, in general there is a need to know which systems exist, what services they do provide and who owns/maintains this system. I would argue this is part of the system registry, so more on the governmental side of things than a runtime dependency.

We had a discussion on this on the telco. There is a trade-off being done here. Either we optimize for fewer interdependencies and fewer messages (as you suggest), or we optimize for less data duplication (i.e. not having the same IP addresses stored in both the system and the service registries). Less data duplication means it becomes easier to guarantee data integrity, which is important. My impression was that most (including myself) were in favor of the former of the two trade-offs.

  1. If I understood the implementation directly a system can have one or more addresses which is good. However, each address is not linked to a service? How is this meant to work? A system could, for example, provide one service as REST/HTTPS and one as MQTT. These would then have two different addresses since they use two completely separate protocols as well. How will the system/service registry handle this?

I had a discussion with @vanDeventer about this a while ago. The highest level of flexibility becomes possible if all information to contact a particular service is in its service registry entry, as a propose here: https://github.com/eclipse-arrowhead/roadmap/issues/22#issuecomment-1002057239.

  1. Finally, I did not fully understand the purpose of the "URI Crawler"? Could you elaborate what purpose it has and how it works.

@tsvetlin Would have to answer this question.

ajoino commented 2 years ago

@emanuelpalm thank you for the description of service vs. operation. The point of disagreement is that I consider the services themselves analogue to methods and don't see the need to involve another level of abstraction.

We had a discussion on this on the telco. There is a trade-off being done here. Either we optimize for fewer interdependencies and fewer messages (as you suggest), or we optimize for less data duplication (i.e. not having the same IP addresses stored in both the system and the service registries). Less data duplication means it becomes easier to guarantee data integrity, which is important. My impression was that most (including myself) were in favor of the former of the two trade-offs

Maybe there is a way to reconcile these ways of thinking (I think we were getting there during the telco): While there is data duplication involved, it's perhaps better to see it as two different kinds of data. The service registration data details how to communicate with a service and, afaics, is unrelated to the topology of the local cloud. The system registration data in contrast is only about the topology of the local cloud, and is important for orchestration and authorization. Since the data is used for different purposes, it's not duplicate. Does that make any sense?

emanuelpalm commented 2 years ago

@ajoino

@emanuelpalm thank you for the description of service vs. operation. The point of disagreement is that I consider the services themselves analogue to methods and don't see the need to involve another level of abstraction.

Let's say you are writing a system that will negotiate contracts with other systems ;-) . In order for your system to be able to negotiate properly with some other system, three kinds of messages will have to be passed between them: (1) proposals (and counter-proposals), (2) acceptances and (3) rejections. Both your system and the system of the counter-party host the same service, able to deal with these three messages. Now, as a service designer, you currently have two options on how you want to design this:

  1. You design a service with three operations, one for each of the proposal, acceptance and rejection messages.
  2. You design a service with a single operation, which accepts a message being the union of the proposal, acceptance and rejection messages.

Both of these approaches would work, of course, because what fundamentally happens at each interface level (network, system, service and operation) is that the message is looked at and passed on to more specialized code until the code that can actually handle it receives it. In the two examples above, that "looking at" occurs inside the service in both cases. However, in case 2 the implementor (programmer) must make that discimination manually (i.e. a "switch" statement calling internal functions). My perception is that most kinds of exchanges will involve different kinds of messages, which means that you generally land on a cleaner implementation if a distinction is made between services and operations.

Maybe there is a way to reconcile these ways of thinking (I think we were getting there during the telco): While there is data duplication involved, it's perhaps better to see it as two different kinds of data. The service registration data details how to communicate with a service and, afaics, is unrelated to the topology of the local cloud. The system registration data in contrast is only about the topology of the local cloud, and is important for orchestration and authorization. Since the data is used for different purposes, it's not duplicate. Does that make any sense?

Do you mean that the system registry should not store IP addresses? What do we want to store in the system registry anyway?

ajoino commented 2 years ago

@emanuelpalm What about

  1. You design three services, one for each of the proposal, acceptance, and rejection messages?

I think our disagreements are about what can be considered a service or not. For example, I would expect all services, whether HTTP or MQTT, to use the same identifier/endpoint/basePath/topic, whereas I take it you and many others think it's fine for a service to have many endpoints. AFAICT this distinction is a matter of taste and conventions.

Do you mean that the system registry should not store IP addresses? What do we want to store in the system registry anyway?

No, I mean that the IP in your suggested service registration message is used for a different purpose than that IP registered as part of a system in the system registry, even if the IP is the same. Let me try to clarify:

The difference here would be how the information is handled. A hypothetical orchestrator could work like this

  1. During an orchestration request, get all services that match the orchestration rule
  2. Ask the authorization system what systems this consumer is allowed to consume
  3. Assuming the authorization system gives a list of system IDs or similar, the orchestrator then asks the system registry for all systems with that ID
  4. The orchestrator filters all services that do not match the IP/hostname + port combinations in the service list
  5. The orchestrator sends that filtered list to the consumer.

I made this up quickly so there are probably a bunch of ways to make this more efficent and possibly remove the system registry from the process. But my point is that service registration data is for establishing connections, which is, in many ways, completely different from system data which describes the local cloud, independent on the services and what connections they can establish. So while the data looks the same, it's semantically different and not duplicated.

emanuelpalm commented 2 years ago

@ajoino

You design three services, one for each of the proposal, acceptance, and rejection messages?

As things are right now, systems may or may not provide their services, while a service must expose all of its operations or not be provided at all. If we would implement the message receivers as services, a given system would be free to only provide one or two of those services. However, the three services are only useful if all of them can be used at once. Guaranteeing that they are all available is important. If we would remove the operation layer and do as you propose, I guess we are essentially removing the system layer and renaming services to systems and operations to services.

I like when things are as simple as they can be, so I reject the idea that this is a matter of taste. If we don't need one of these layers, I want it removed. I still have a hunch, however, that there is some fundamental reason why we cannot remove any of them. I just need to be convinced there is no such reason.

Another way of viewing the system is as an abstract representation of a software instance. The software exposes services, each of which guarantees that a certain number of operations will be available. Isn't that a good enough reason to say we have to have three layers?

vanDeventer commented 2 years ago

Good morning,

There is a French expression that describe my constant state with these ideas: «je comprends vite mais il faut m'expliquer longtemps »

I have a few questions

  1. Does an MQTT broker belong to a local cloud?
  2. When you publish a topic to the broker, is is also registered as a service with the Service Registry system? If yes, by whom?
  3. Do you use the GateKeeper and Gateway to subscribe to an MQTT topic in another local cloud?
jenseliasson commented 2 years ago

Hi

See my comments below inline.

On Fri, 11 Mar 2022, 09:52 Jan van Deventer @.***> wrote:

Good morning,

There is a French expression that describe my constant state with these ideas: «je comprends vite mais il faut m'expliquer longtemps »

I have a few questions

  1. Does an MQTT broker belong to a local cloud?

An MQTT broker do belong to a local cloud, even though the broker itself is invisible in an Arrowhead context. The broker do not offer or consume any services directly.

  1. When you publish a topic to the broker, is is also registered as a service with the Service Registry system? If yes, by whom?

The system that is publishing to a topic do not register anything. Instead, the system that "owns" the topic and subscribes to it must register it in the Service registry like it is done with http or CoAP. There are some security concerns that I have outlined in the report on enhanced MQTT security in an Arrowhead local cloud that I write earlier.

  1. Do you use the GateKeeper and Gateway to subscribe to an MQTT topic in another local cloud?

I didn't investigate this, but my intuition says that if inter-cloud MQTT communication is desired, then the appropriate security mechanisms must be used. And adding support for MQTT topics to the gateway/gatekeeper systems makes a lot of sense to me.

Jens

— Reply to this email directly, view it on GitHub https://github.com/eclipse-arrowhead/roadmap/issues/44#issuecomment-1064902600, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA52PBKAK2FW45PSADUNW4LU7MCUNANCNFSM5PB52SHQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

DavidRutqvist commented 2 years ago

Even though I think this is a bit outside the scope of this issue, following on what @jenseliasson said above.

  1. When you publish a topic to the broker, is is also registered as a service with the Service Registry system? If yes, by whom? The system that is publishing to a topic do not register anything. Instead, the system that "owns" the topic and subscribes to it must register it in the Service registry like it is done with http or CoAP. There are some security concerns that I have outlined in the report on enhanced MQTT security in an Arrowhead local cloud that I write earlier.

If a service uses Pub/Sub over MQTT wouldn't the service producer register the topic then so consumers can consume through this topic? Maybe you referred to a non-pub/sub example though. Registering a topic as a producer is saying "I will announce changes in this forum, listen if you would like to receive them".

  1. Do you use the GateKeeper and Gateway to subscribe to an MQTT topic in another local cloud? I didn't investigate this, but my intuition says that if inter-cloud MQTT communication is desired, then the appropriate security mechanisms must be used. And adding support for MQTT topics to the gateway/gatekeeper systems makes a lot of sense to me.

Using gatekeeper and gateway sounds reasonable. The gateway would simply consume one topic from one local cloud broker and then publish to another topic at a broker in the second local cloud.

UlfSlunga-Sinetiq commented 2 years ago

@emanuelpalm

@UlfSlunga-Sinetiq As far as I know, we will add unique service instance identifiers to the messages of relevance in the v5.0.0 release. You are not the only one who have raised this concern.

Sounds really good!

tsvetlin commented 2 years ago

@jerkerdelsing

Register-service "Please note that SR does not check the existense of the system during the operation but marks the entry as not active. An asynchronous task will handle the system check (using WS call) and updating the flag if necessary."

This is related to authentication of the system. Done in the same way (SSH) as authenticating related to a service request?

This is not related to the authentication, but related to the Orchestration. During orchestration, with the orchestration flags a consumer can specify, that it only wants reachable results. The Service Registry checks if a requested provider is reachable, and only returns them at that scenario.

Two staged startup

This should only be allowed with a "Management" certificate.

This is neccessary to circumvent the cross dependency of the Service Registry, System Registry, Device Registry. Since during startup thy need to register their services, their system and device. When Service Registry is starting up, it needs to register its services. System into the SysR -> SysR did not start yet, Device into the DR -> DR did not start yet. When SysR is starting up, it needs to register its services -> SR did not start yet, it cannot do that, nor the Device When DR is starting up, it needs to register its services, system, and it can not do that.

The Two staged startup enables these core systems to register themselves into each other and then finish the startup procedure to be available for other systems in the Local Cloud.

name (mandatory): name of the device. Text. Maximum length is 63 characters, only contains letters (english alphabet), numbers and dash (-), and must start with a letter (also cannot end with dash). Must be unique.

Unique in the local cloud where it's deployed.

Yes! Unique in the LC.

tsvetlin commented 2 years ago

@UlfSlunga-Sinetiq

In order to fully achieve loose coupling between core services (making them replaceable by e.g. not sharing db's), we see a need for a new field, to be added to both the request and the response:

serviceInstanceId (a string uniquely identifying the service instance)

As @emanuelpalm said before, this will be included in 5.0.0

tsvetlin commented 2 years ago

@CristinaPaniagua

Every system (provider or consumer) needs an interface or interface to communicate with other systems. Therefore, the interface_protocol parameters should be part of the System registry, not the Service Registry. If we keep the interface details attached to services we are limiting the interface information to providers and neglecting consumers.

During the orchestration a consumer can specify which interface does it need, and the Orchestrator can find it a suitable Provider.

What benefits do we have if we are aware what interfaces does a consumer support? Could you please further explain it with an example?

tsvetlin commented 2 years ago

@DavidRutqvist

Finally, I did not fully understand the purpose of the "URI Crawler"? Could you elaborate what purpose it has and how it works.

URI Crawler is implemented in the JAVA core systems, to discover services necessary for their operation. The core systems are using the Service Discovery service's Query operation

tsvetlin commented 2 years ago

@emanuelpalm raised a concern, that the registration procedure (see quoted image) is complicated, many messages has to be passed.

Circumeventing this issue. It can be a good solution to have a "Registry" supporting Core system, which could behave as an API Gateway, where the Application Systems could register/unregister themselves in one message.

This would make scaling much easier (multiple instances of SR,SysR, DR) and behave as a load balancer. This Registry API Gateway would be an optional core system. It would offer the simple registration.

registry drawio (3)

Application System Registration Procedure: registration_procedure drawio

Gatherin data about core systems once, to handle incoming Application System requests. register_discover drawio

Application Registration Process in high definition

image

tsvetlin commented 2 years ago

@emanuelpalm in the last WG telco. You mentioned that you feel like SysR and DR are "almost mandatory" core systems. Could you please explain what do you exactly mean as "almost mandatory"?

emanuelpalm commented 2 years ago

@tsvetlin

@emanuelpalm raised a concern, that the registration procedure (see quoted image) is complicated, many messages has to be passed.

Circumeventing this issue. It can be a good solution to have a "Registry" supporting Core system, which could behave as an API Gateway, where the Application Systems could register/unregister themselves in one message.

This could be a good solution. Could it be used to eliminate the call to the orchestrator as well? You send one message with your device, system and service details, as well as what services you are capable of consuming, and you get the appropriate service records in the response? I guess it should be possible to enable orchestration push in the message, as an alternative to receiving the service records in the first response.

emanuelpalm commented 2 years ago

@tsvetlin

@emanuelpalm in the last WG telco. You mentioned that you feel like SysR and DR are "almost mandatory" core systems. Could you please explain what do you exactly mean as "almost mandatory"?

I assume that most use cases where an Arrowhead Local Cloud is employed will see great value in maintaining registries of what devices they have, what systems they are running and what services they provide. If not for practical reasons, but because that information is just useful in the general sense. This makes these registries "practically mandatory", or almost mandatory.

However, each of these systems introduces another dimension of dynamicity. As far as I understand it, if you have a device registry the set of devices in your cloud can change at runtime. If you have a system registry the set of systems can change, while a service registry enables you to dynamically handle what services are provided and consumed. In other words, the more information about you local cloud that does not have to change, the fewer of these systems do you need (if you are willing to manually provide all systems with the information they need that would otherwise go into these registries).

All this being said, the point of Arrowhead is dynamicity. If you remove dynamic service discovery the most vital type of dynamicity is lost. I guess you could modify the authorization and orchestrator systems such that they operate on predefined static service registries (via config-files or similar), which would make it possible to have late service binding without service discovery. Anyhow, I guess "practically mandatory" is kind of the same as "mandatory".

jerkerdelsing commented 2 years ago

Conclusion Onboarding procedure not documented sufficiently.

Two staged startup

Two staged startup seams approapriate

Full common name (mandatory): name of the device. Text. Maximum length is 63 characters, only contains letters (english alphabet), numbers and dash (-), and must start with a letter (also cannot end with dash). Must be unique.

mac_address (mandatory): MAC address assigned to the device in text format:12 digits hexadecimal number with colon every two digits (an octet). Must be unique.

Name discussion in relation top X.509 certificate to be addressed in specific meeting.

DeviceRegistry
device table

Sinetiq will take the lead in creating SoSD, SysD and SD documents for a set of the core systems. First version by end of May 2022. Mandatory core systems: ServiceRegistry, Authorisation, Orchestration, DeviceRegistry, SystemRegistry

Interface registration: provide a set of metadata which are recommended. To be further discussed in connection with the SoSD, SysD, SD documents as lead by Sinetiq,

David Rutqvist comments

  1. Documents to be transferred into SoSD, SysD,and Sd, lead by Sinetiq
  2. Make use of Latex templates
  3. Resolved by the proposal from Emanuel, see previous comments
  4. See 3)
  5. See 3)
  6. Skip this one

Szvetlin Support core system: Registry (update of the name is possibly necessary) Liked and agreed by the team.

This and above MoM will be one of the sources for SoSD, SysD and SD document for the Arrowhead Architecture SoSD and ServiceRegistry, SystemRegistry, DeviceRegistry, authorisation and Orchestration systems, SysD and SD's.

By this I close this issue.