Apicurio / apicurio-studio

Open Source API Design
https://www.apicur.io/studio/
Apache License 2.0
988 stars 496 forks source link

Clustering / High Availability configuration #769

Open matthew-muscat opened 5 years ago

matthew-muscat commented 5 years ago

Having set this up on Kubernetes using the apicurio-ui, apicurio-ws and apicurio-api docker images, we're now wanting to move this workload into a high-availability configuration (ie: more than 1 replica of each container).

However, it appears that there's some cluster cache / session issue occurring, as we're seeing authentication with keycloak failing (ie: invalid state code) whenever there's more than a single replica setup.

  1. Is a high availability configuration supported in API Curio? (i'm assuming so based on https://github.com/Apicurio/apicurio-studio/issues/360)
  2. Is there documentation / flags within the docker images to enable configuration for the node names + cluster consensus / replication?

FYI — we've been able to achieve this in keycloak with wildfly's jgroups, using the standalone-ha.xml config file with some minor adjustments to work inside k8s (ie: setting node name based on pod name, using tcp for jgroups, creating a service for jgroups discovery)..

EricWittmann commented 5 years ago

OK great question here. The disappointing short answer to this is "no, not yet". But let me go through it and explain where we're at.

As you mention, there are three Apicurio components that can be scaled independently. I'll go through each of them:

Now I think we have all the pieces to make an HA version of Apicurio (the hard part was really the editing layer). But it hasn't yet been put together in a way that will be easy for you to consume. If you're game to be the first HA deployment, I am happy to try to pull all this together for you to try. :)

matthew-muscat commented 5 years ago

Many thanks for the detailed explanation @EricWittmann

Seems like apicurio-ui and apicurio-ws are there areas that would need to be look into — i was in particular running into the keycloak issues with apicurio-ui.

Could apicurio-ui implement an authentication binding similar to keycloak-gatekeeper? This relies on the client-side browser cookie, meaning it's able to avoid needing session affinity / clustering? Is there a reason why the auth needs to be stateful?

apicurio-ws seems like it'll just require exposing the artemis config in the dockerfile — in particular, JGroups can be utilised here, a mentioned in https://activemq.apache.org/components/artemis/documentation/latest/clusters.html

I'm happy to attempt getting this setup, along with contributing with some docs once it's all worked out.

EricWittmann commented 5 years ago

I will admit that I am not 100% sure about apicurio-ui with respect to scaling the Keycloak adapter (and I realize this is the first problem you've hit). Keycloak has some documentation about this here:

https://www.keycloak.org/docs/2.5/securing_apps/topics/oidc/java/application-clustering.html

It seems that, as you pointed out, a cookie based token store may be the answer. That will need some testing as there are one or two place in apicurio-ui that utilize the server-side security context. But that's easy enough for me to test. And to answer your question - there is no reason why auth needs to be stateful.

As for apicurio-ws, I think the following is needed:

Currently all of the development and testing of this has been done using Wildfly, but the Docker images for Apicurio all use Thorntail as the runtime platform. I assume that's not a problem, it just hasn't been tested. So we'll need to figure out the appropriate configuration of Thorntail to enable messaging. If you're interested in taking a look at what we have now, there is a decent (although poorly named) integration test for it here:

https://github.com/Apicurio/apicurio-studio/tree/master/test/integration/arquillian

Note the relevant portion of the pom.xml here:

https://github.com/Apicurio/apicurio-studio/blob/master/test/integration/arquillian/pom.xml#L187-L364

And then the actual test here:

https://github.com/Apicurio/apicurio-studio/blob/master/test/integration/arquillian/src/test/java/io/apicurio/test/integration/arquillian/VerifyDistributedSetupTestIT.java

We basically set up a full environment, including database, artemis broker, and two apicurio-ws instances. Then the test creates an API and edits it from two different websocket clients to make sure that messages/events are properly sent to all parties.

Finally, to address your last point - I would of course LOVE any contribution you are willing to make in this area. So based on what you now know about this, let me know what you're comfortable helping out with and how I can help/support you!

EricWittmann commented 5 years ago

Hey @matthew-muscat - I'm back from traveling and working on some other tasks and am now available to assist with this if needed. What is your status? :)