arrowhead-f / arrowhead-kalix-examples

Arrowhead Kalix Examples
Eclipse Public License 2.0
0 stars 5 forks source link

Cloud (?!) fails on finding and providing requested service #7

Closed strohbert closed 3 years ago

strohbert commented 3 years ago

We posted this issue in the slack (https://arrowhead-dev.slack.com/archives/CBBMWTZNZ/p1621001024046300) but did not get any response. Therefore I opened an issue here directly in the repo.

We are trying to start up the kalix echo-cloud example (https://github.com/arrowhead-f/arrowhead-kalix-examples/tree/master/echo-cloud). We can successfully run the provider, but the cloud (?!) fails on finding and providing the service (see exception below). We are running version 4.2.0 of arrowhead (https://github.com/arrowhead-f/core-java-spring) and we did not alter the kalix example in any way. The registration of both, the provider and the consumer seems to work just fine as the entries in the database suggest. We are using certificates based on the testcloud2.aitia.arrowhead.eu and the provider starts flawlessly but the consumer fails on sending the request (line 83 of EchoConsumer.java .flatMap(consumer -> consumer.send(new HttpConsumerRequest() ). We are also unsure about the line INFO: HTTP/JSON cloud plugin resolved orchestration service at /127.0.0.1:8441 - as we expect something like localhost:8441 as endpoint.

...
Consumer service is being provided ...
Mai 14, 2021 3:36:52 PM se.arkalix.core.plugin.HttpJsonCloudPlugin$Attached requestOrchestration
INFO: HTTP/JSON cloud plugin connecting to "orchestrator" system ...
Mai 14, 2021 3:36:54 PM se.arkalix.core.plugin.HttpJsonCloudPlugin$Attached lambda$requestOrchestration$17
INFO: HTTP/JSON cloud plugin resolved orchestration service at /127.0.0.1:8441
GET /example/pings/32 failed:
se.arkalix.query.ServiceNotFoundException: No service with the following properties could be resolved: name=Optional[kalix-example-provider-service], isSecure=true, encodings=[JSON], transports=[HTTP]
 at se.arkalix.query.ServiceQuery.lambda$resolveOne$4(ServiceQuery.java:451)

Any suggestions or hint, what might be the problem?

emanuelpalm commented 3 years ago

Hey @strohbert! Nice reading that you have been giving Kalix a go.

Are you using the docker-compose.yml file to run the example? It automatically sets up all the authorization and orchestration rules necessary for the Echo Consumer to lookup and access the Echo Provider. The setting up is performed using the Cloud Configurator, which is a custom-made system meant to make it easier to configure local clouds. It is only meant to exist until an official solution for making such convenient configurations becomes available. It requires a configuration file to work. An example can be seen here.

The reason I believe missing authorization and orchestration rules to be the issue is that you get a ServiceNotFoundException, which is typical when proper rules are not configured. I hope this helps!

awoSYS commented 3 years ago

Hi @emanuelpalm! I'm working together with @strohbert trying to get this example to work and understand it. Thanks for your support! We were trying to simply start the two systems (echo provider and consumer) by running the respective jars and then got the error. From your post I understand that this is not going to work, since the cloud configurator system is supposed to do all the hard work setting up the AH rules such that provider and consumer are able to find each other. However, I'm still having trouble to understand that configurator system.

emanuelpalm commented 3 years ago

@awoSYS @strohbert I assume you tried the cloud-standalone example project first? Did that work out for you?

The difference between the standalone and cloud examples is that the former consists solely of the two systems (i.e. Java applications) provided in the example. The latter provides three systems (echo provider, echo consumer and configurator), and assumes that you have three more running (a Service Registry system, an Authorization system and an Orchestrator system). The reason for the increased complexity of the second example is that the extra systems allows for the echo consumer and provider to dynamically find each other and exchange messages, without prior knowledge about each other IP addresses or other details.

All systems part of the same local cloud (a cluster of systems all managed by the same operator) must have X.509 certificates issued by the same cloud certificate holder. All certificates must conform to the Eclipse Arrowhead certificate profile (which is not yet published/formalized, but a brief description of its current implementation can be read here).

If dynamic service discovery is desired (as in the echo-cloud example), all systems in the local cloud must know the IP address of the Service Registry system, and the port (and other details) through which its Service Discovery service can be reached. The service registry, authorization and orchestrator systems must, of course, all be running. In addition to knowing the IP address of the Service Registry system, there must be authorization rules and orchestration rules setup in the Authorization and Orchestrator systems. This is what the configurator system in the example does. It registers the rules necessary for dynamic service resolution between the echo consumer and provider.

The documentation situation currently leaves quite a lot to be desired. The official one is here. I have written quite some documentation for the Kalix library here. Important sections to read are [1] and [2].

Just tell me if you have any more questions. I'll try to answer as soon as time permits.

awoSYS commented 3 years ago

Thanks @emanuelpalm for that much information!

I understand that the standalone example is only supposed to show how the setup would look like with producer and consumer being wired via hardcoded addresses. I ran it and it worked:

Logs ``` DELETE /example/runtime result: 204 No Content GET /example/pings/32 result: 200 OK {"ping":"pong","id":"32","timestamp":"2021-06-21T14:00:31.642509Z"} POST /example/pings result: Ping{ping='pong!', id='null', timestamp=null} Done! ```

The cloud example also seems to work now:

Echo consumer logs ``` DELETE /example/runtime result: 204 No Content GET /example/pings/32 result: Ping{ping='pong', id='32', timestamp=2021-06-21T13:50:56.727118Z} ```

(Our earlier problems were most likely caused Linux-Windows character encoding incompatibility --> Checking out the repo on Windows made the commands in docker-compose.yml unreadable for the Linux containers).

However, I've got a couple of questions left:

Thanks for your support!

emanuelpalm commented 3 years ago

Nice that you are making progress!

Any more questions? Did I miss anything?

awoSYS commented 3 years ago

Thanks @emanuelpalm, your explanations already helped me a lot understanding what's going on! If you're not yet exhausted of my asking though, I'd be happy about even more input from you!

I didn't manage to open the swagger UIs, neither using Chromium, nor Firefox. I'll skip that for now.

About the configurator system:

About security and authorization:

Sorry for the amount of text! Feel free to ignore what you don't fancy to answer ;) And thanks for your support again!

emanuelpalm commented 3 years ago

Configurator system

  1. It is still something that is valid for the last verion of AHFW, as far as I know. I may be wrong though, as I do know that this is something that is being discussed and worked on. My impression is that it will not be fixed until AHFW version 5, but may be that I'm not up to speed with the latest developments.
  2. Well, the Kalix cloud plugin will automatically attempt to register the services provided by its systems, which means that in the echo-cloud demo, registration occurs twice. There is nothing that requires this behavior, however. I will likely revisit the solution and try to improve it in future versions of Kalix.
  3. Because how this is to be solved in Arrowhead is still under consideration. My configurator is not to be regarded as more than an interim until a consensus has been reached and a workable solution produced.

Security and Authorization

  1. The token-based approach is strongly recommended (as far as I know) for all systems that are not essential core systems provided by the Eclipse Arrowhead project, for which reason the documentation focuses on it. The certificate-only approach is something the developers of the original Arrowhead core systems figured out they had to have in order to be able to start up a functional local cloud that can then offer token-based authorization to the rest of its systems. More access control mechanisms are being discussed, such as being able to use different kinds of tokens and having different kinds of certificate-only criteria.
  2. A certificate issued by the local cloud certificate is required for every system to be part of that local cloud, irrespective of what access control policy (i.e. secure mode) is used. Yes.
  3. Yes. Developers should be aware, however, that the timing, performance and reliance implications are different.
  4. The reason for a truststore being required at all is because of how TLS operates. TLS requires that communicating systems agree on a single issuer that both systems trust. That certificate could just as well have been extracted from the key store of each system (as it also contains the same certificate, which I'm sure you've noticed). By having a separate truststore, however, I leave room for being able to use TLS to connect to non-Arrowhead systems. I may change the behavior of Kalix to extract the truststore from its keystore if no truststore is specified. We'll see.

I hope this helped!

awoSYS commented 3 years ago

Thanks @emanuelpalm, I understand ArKalix and the AHFW much better now!!

emanuelpalm commented 3 years ago

@awoSYS No problem! If you believe your questions to be answered at this point. Please close this issue. Thank you!

awoSYS commented 3 years ago

@strohbert imho you can close this issue.